Blip2

04 of february of 2023

Blip 2

Introduction

This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.

Blip2 is an artificial intelligence that is capable of taking an image or video as input and having a conversation and answering questions or delivering context of what this input shows in a very accurate way 🤯

GitHub

Paper

Installation

In order to install this tool, it is best to create a new anaconda environment.

	
		!$ conda create -n blip2 python=3.9

Now we get into the environment

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2

We install all the necessary modules

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter

Finally we install blip2

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis

Use

We load the necessary libraries

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis
import torch
      from PIL import Image
      import requests
      from lavis.models import load_model_and_preprocess

We load an example image

img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
      raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')   
      display(raw_image.resize((500, 500)))

We set the GPU if any

	
		!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis
import torch
from PIL import Image
import requests
from lavis.models import load_model_and_preprocess
img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')   
display(raw_image.resize((500, 500)))
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
device

	
		device(type='cuda')

We assign a model. In my case that I have a computer with 32 GB of RAM and a GPU 3060 with 12 GB of VRAM I can not use all of them, so I have put next to a comment ok with the models that I have been able to use, and those that not, the error that it has given me. If you have a computer with the same RAM and VRAM you already know which ones you can use, if not you have to test

# name = "blip2_opt"; model_type = "pretrain_opt2.7b"           # ok
      # name = "blip2_opt"; model_type = "caption_coco_opt2.7b"       # FAIL VRAM
      # name = "blip2_opt"; model_type = "pretrain_opt6.7b"           # FAIL RAM
      # name = "blip2_opt"; model_type = "caption_coco_opt6.7b"       # FAIL RAM
      
      # name = "blip2"; model_type = "pretrain"                       # FAIL type error
      # name = "blip2"; model_type = "coco"                           # ok
      
      name = "blip2_t5"; model_type = "pretrain_flant5xl"           # ok
      # name = "blip2_t5"; model_type = "caption_coco_flant5xl"       # FAIL VRAM
      # name = "blip2_t5"; model_type = "pretrain_flant5xxl"          # FAIL
      
      model, vis_processors, _ = load_model_and_preprocess(
          name=name, model_type=model_type, is_eval=True, device=device
      )
      
      vis_processors.keys()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Out[4]:

dict_keys(['train', 'eval'])

We prepare the image for inserting it into the model.

	
		# name = "blip2_opt"; model_type = "pretrain_opt2.7b"           # ok
      # name = "blip2_opt"; model_type = "caption_coco_opt2.7b"       # FAIL VRAM
      # name = "blip2_opt"; model_type = "pretrain_opt6.7b"           # FAIL RAM
      # name = "blip2_opt"; model_type = "caption_coco_opt6.7b"       # FAIL RAM
      
      # name = "blip2"; model_type = "pretrain"                       # FAIL type error
      # name = "blip2"; model_type = "coco"                           # ok
      
      name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
      # name = "blip2_t5"; model_type = "caption_coco_flant5xl"       # FAIL VRAM
      # name = "blip2_t5"; model_type = "pretrain_flant5xxl"          # FAIL
      
      model, vis_processors, _ = load_model_and_preprocess(
          name=name, model_type=model_type, is_eval=True, device=device
      )
      
      vis_processors.keys()
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)

We analyze the image without asking any questions

	
		# name = "blip2_opt"; model_type = "pretrain_opt2.7b"           # ok
# name = "blip2_opt"; model_type = "caption_coco_opt2.7b"       # FAIL VRAM
# name = "blip2_opt"; model_type = "pretrain_opt6.7b"           # FAIL RAM
# name = "blip2_opt"; model_type = "caption_coco_opt6.7b"       # FAIL RAM
# name = "blip2"; model_type = "pretrain"                       # FAIL type error
# name = "blip2"; model_type = "coco"                           # ok
name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
# name = "blip2_t5"; model_type = "caption_coco_flant5xl"       # FAIL VRAM
# name = "blip2_t5"; model_type = "pretrain_flant5xxl"          # FAIL
model, vis_processors, _ = load_model_and_preprocess(
    name=name, model_type=model_type, is_eval=True, device=device
)
vis_processors.keys()
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
model.generate({"image": image})

	
		Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
['a black and white snake']

We analyze the image by asking

	
		prompt = None

	
		prompt = None
def prepare_prompt(prompt, question):
          if prompt is None:
              prompt = question + " Answer:"
          else:
              prompt = prompt + " " + question + " Answer:"
          return prompt

	
		prompt = None
def prepare_prompt(prompt, question):
          if prompt is None:
              prompt = question + " Answer:"
          else:
              prompt = prompt + " " + question + " Answer:"
          return prompt
def get_answer(prompt, question, model):
          prompt = prepare_prompt(prompt, question)
          answer = model.generate(
              {
                  "image": image,
                  "prompt": prompt
              }
          )
          answer = answer[0]
          prompt = prompt + " " + answer + "."
          return prompt, answer

	
		prompt = None
def prepare_prompt(prompt, question):
    if prompt is None:
        prompt = question + " Answer:"
    else:
        prompt = prompt + " " + question + " Answer:"
    return prompt
def get_answer(prompt, question, model):
    prompt = prepare_prompt(prompt, question)
    answer = model.generate(
        {
            "image": image,
            "prompt": prompt
        }
    )
    answer = answer[0]
    prompt = prompt + " " + answer + "."
    return prompt, answer
question = "What's in the picture?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")

	
		Question: What's in the picture?
Answer: a snake

	
		question = "What kind of snake?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")

	
		Question: What kind of snake?
Answer: cobra

	
		question = "Is it poisonous?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")

	
		Question: Is it poisonous?
Answer: yes

	
		question = "If it bites me, can I die?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")

	
		Question: If it bites me, can I die?
Answer: yes

Continue reading

Agents patterns

Are your agents falling short? Elevate your AI projects with advanced patterns: ReAct, planning, multi-agents, and more. Practical guide with code!

LangGraph: Revolutionize your AI agents

🚀 Revolutionize your AI agents! 🧠 LangGraph is not just another library, it's the orchestration framework that gives you total control to build complex agents, with long-term memory and even human intervention! Say goodbye to basic chatbots, it's time to create true intelligence. Dive into this post and discover it!

Create virtual environments with uv

Learn how to create virtual environments with uv, a package manager and environment for Python written in Rust, which makes it very fast. If you have had problems with the waiting times using conda, or want a faster and easier alternative to venv, enter and see how to use uv.

Last posts -->

Have you seen these projects?

Horeca chatbot

Naviground

Subtify

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Memory profiler

See the memory usage of a script

DataLoader with pin_memory and num_workers

Increase DataLoader performance with pin_memory and num_workers

py-smi

Python library to get GPU data like `nvidia-smi`

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.