Blip2

Blip2 Blip2

Blip 2link image 6

Introductionlink image 7

This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.

Blip2 is an artificial intelligence that is capable of taking an image or video as input and having a conversation and answering questions or delivering context of what this input shows in a very accurate way 🤯

GitHub

Paper

Installationlink image 8

In order to install this tool, it is best to create a new anaconda environment.

	
!$ conda create -n blip2 python=3.9
Copy

Now we get into the environment

	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
Copy

We install all the necessary modules

	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
Copy
	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
Copy
	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
Copy
	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
Copy

Finally we install blip2

	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis
Copy

Uselink image 9

We load the necessary libraries

	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis
import torch
from PIL import Image
import requests
from lavis.models import load_model_and_preprocess
Copy

We load an example image

img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
      raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')   
      display(raw_image.resize((500, 500)))
      
image blip2 1

We set the GPU if any

	
!$ conda create -n blip2 python=3.9
!$ conda activate blip2
!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda install -c anaconda pillow
!$ conda install -y -c anaconda requests
!$ conda install -y -c anaconda jupyter
!$ pip install salesforce-lavis
import torch
from PIL import Image
import requests
from lavis.models import load_model_and_preprocess
img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
display(raw_image.resize((500, 500)))
device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
device
Copy
	
device(type='cuda')

We assign a model. In my case that I have a computer with 32 GB of RAM and a GPU 3060 with 12 GB of VRAM I can not use all of them, so I have put next to a comment ok with the models that I have been able to use, and those that not, the error that it has given me. If you have a computer with the same RAM and VRAM you already know which ones you can use, if not you have to test

# name = "blip2_opt"; model_type = "pretrain_opt2.7b"           # ok
      # name = "blip2_opt"; model_type = "caption_coco_opt2.7b"       # FAIL VRAM
      # name = "blip2_opt"; model_type = "pretrain_opt6.7b"           # FAIL RAM
      # name = "blip2_opt"; model_type = "caption_coco_opt6.7b"       # FAIL RAM
      
      # name = "blip2"; model_type = "pretrain"                       # FAIL type error
      # name = "blip2"; model_type = "coco"                           # ok
      
      name = "blip2_t5"; model_type = "pretrain_flant5xl"           # ok
      # name = "blip2_t5"; model_type = "caption_coco_flant5xl"       # FAIL VRAM
      # name = "blip2_t5"; model_type = "pretrain_flant5xxl"          # FAIL
      
      model, vis_processors, _ = load_model_and_preprocess(
          name=name, model_type=model_type, is_eval=True, device=device
      )
      
      vis_processors.keys()
      
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Out[4]:
dict_keys(['train', 'eval'])

We prepare the image for inserting it into the model.

	
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok
# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM
# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM
# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM
# name = "blip2"; model_type = "pretrain" # FAIL type error
# name = "blip2"; model_type = "coco" # ok
name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM
# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAIL
model, vis_processors, _ = load_model_and_preprocess(
name=name, model_type=model_type, is_eval=True, device=device
)
vis_processors.keys()
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
Copy

We analyze the image without asking any questionslink image 10

	
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok
# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM
# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM
# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM
# name = "blip2"; model_type = "pretrain" # FAIL type error
# name = "blip2"; model_type = "coco" # ok
name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM
# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAIL
model, vis_processors, _ = load_model_and_preprocess(
name=name, model_type=model_type, is_eval=True, device=device
)
vis_processors.keys()
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
model.generate({"image": image})
Copy
	
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
['a black and white snake']

We analyze the image by askinglink image 11

	
prompt = None
Copy
	
prompt = None
def prepare_prompt(prompt, question):
if prompt is None:
prompt = question + " Answer:"
else:
prompt = prompt + " " + question + " Answer:"
return prompt
Copy
	
prompt = None
def prepare_prompt(prompt, question):
if prompt is None:
prompt = question + " Answer:"
else:
prompt = prompt + " " + question + " Answer:"
return prompt
def get_answer(prompt, question, model):
prompt = prepare_prompt(prompt, question)
answer = model.generate(
{
"image": image,
"prompt": prompt
}
)
answer = answer[0]
prompt = prompt + " " + answer + "."
return prompt, answer
Copy
	
prompt = None
def prepare_prompt(prompt, question):
if prompt is None:
prompt = question + " Answer:"
else:
prompt = prompt + " " + question + " Answer:"
return prompt
def get_answer(prompt, question, model):
prompt = prepare_prompt(prompt, question)
answer = model.generate(
{
"image": image,
"prompt": prompt
}
)
answer = answer[0]
prompt = prompt + " " + answer + "."
return prompt, answer
question = "What's in the picture?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: What's in the picture?
Answer: a snake
	
question = "What kind of snake?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: What kind of snake?
Answer: cobra
	
question = "Is it poisonous?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: Is it poisonous?
Answer: yes
	
question = "If it bites me, can I die?"
prompt, answer = get_answer(prompt, question, model)
print(f"Question: {question}")
print(f"Answer: {answer}")
Copy
	
Question: If it bites me, can I die?
Answer: yes

Continue reading

Last posts -->

Have you seen these projects?

Subtify

Subtify Subtify

Subtitle generator for videos in the language you want. Also, it puts a different color subtitle to each person

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.

Flow edit

Flow edit Flow edit

FLUX.1-RealismLora

FLUX.1-RealismLora FLUX.1-RealismLora
View all containers -->

Do you want to apply AI in your project? Contact me!

Do you want to train your model with these datasets?

short-jokes-dataset

Dataset with jokes in English

opus100

Dataset with translations from English to Spanish

netflix_titles

Dataset with Netflix movies and series

View more datasets -->