Blip 2
Introduction
This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.
Blip2 is an artificial intelligence that is capable of taking an image or video as input and having a conversation and answering questions or delivering context of what this input shows in a very accurate way 🤯
Installation
In order to install this tool, it is best to create a new anaconda environment.
!$ conda create -n blip2 python=3.9
Now we get into the environment
!$ conda create -n blip2 python=3.9!$ conda activate blip2
We install all the necessary modules
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow!$ conda install -y -c anaconda requests
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow!$ conda install -y -c anaconda requests!$ conda install -y -c anaconda jupyter
Finally we install blip2
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow!$ conda install -y -c anaconda requests!$ conda install -y -c anaconda jupyter!$ pip install salesforce-lavis
Use
We load the necessary libraries
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow!$ conda install -y -c anaconda requests!$ conda install -y -c anaconda jupyter!$ pip install salesforce-lavisimport torchfrom PIL import Imageimport requestsfrom lavis.models import load_model_and_preprocess
We load an example image
img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
display(raw_image.resize((500, 500)))
We set the GPU if any
!$ conda create -n blip2 python=3.9!$ conda activate blip2!$ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia!$ conda install -c anaconda pillow!$ conda install -y -c anaconda requests!$ conda install -y -c anaconda jupyter!$ pip install salesforce-lavisimport torchfrom PIL import Imageimport requestsfrom lavis.models import load_model_and_preprocessimg_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg/800px-12_-_The_Mystical_King_Cobra_and_Coffee_Forests.jpg'raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')display(raw_image.resize((500, 500)))device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')device
device(type='cuda')
We assign a model. In my case that I have a computer with 32 GB of RAM and a GPU 3060 with 12 GB of VRAM I can not use all of them, so I have put next to a comment ok
with the models that I have been able to use, and those that not, the error that it has given me. If you have a computer with the same RAM and VRAM you already know which ones you can use, if not you have to test
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok
# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM
# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM
# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM
# name = "blip2"; model_type = "pretrain" # FAIL type error
# name = "blip2"; model_type = "coco" # ok
name = "blip2_t5"; model_type = "pretrain_flant5xl" # ok
# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM
# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAIL
model, vis_processors, _ = load_model_and_preprocess(
name=name, model_type=model_type, is_eval=True, device=device
)
vis_processors.keys()
We prepare the image for inserting it into the model.
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM# name = "blip2"; model_type = "pretrain" # FAIL type error# name = "blip2"; model_type = "coco" # okname = "blip2_t5"; model_type = "pretrain_flant5xl" # ok# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAILmodel, vis_processors, _ = load_model_and_preprocess(name=name, model_type=model_type, is_eval=True, device=device)vis_processors.keys()image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
We analyze the image without asking any questions
# name = "blip2_opt"; model_type = "pretrain_opt2.7b" # ok# name = "blip2_opt"; model_type = "caption_coco_opt2.7b" # FAIL VRAM# name = "blip2_opt"; model_type = "pretrain_opt6.7b" # FAIL RAM# name = "blip2_opt"; model_type = "caption_coco_opt6.7b" # FAIL RAM# name = "blip2"; model_type = "pretrain" # FAIL type error# name = "blip2"; model_type = "coco" # okname = "blip2_t5"; model_type = "pretrain_flant5xl" # ok# name = "blip2_t5"; model_type = "caption_coco_flant5xl" # FAIL VRAM# name = "blip2_t5"; model_type = "pretrain_flant5xxl" # FAILmodel, vis_processors, _ = load_model_and_preprocess(name=name, model_type=model_type, is_eval=True, device=device)vis_processors.keys()image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)model.generate({"image": image})
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]['a black and white snake']
We analyze the image by asking
prompt = None
prompt = Nonedef prepare_prompt(prompt, question):if prompt is None:prompt = question + " Answer:"else:prompt = prompt + " " + question + " Answer:"return prompt
prompt = Nonedef prepare_prompt(prompt, question):if prompt is None:prompt = question + " Answer:"else:prompt = prompt + " " + question + " Answer:"return promptdef get_answer(prompt, question, model):prompt = prepare_prompt(prompt, question)answer = model.generate({"image": image,"prompt": prompt})answer = answer[0]prompt = prompt + " " + answer + "."return prompt, answer
prompt = Nonedef prepare_prompt(prompt, question):if prompt is None:prompt = question + " Answer:"else:prompt = prompt + " " + question + " Answer:"return promptdef get_answer(prompt, question, model):prompt = prepare_prompt(prompt, question)answer = model.generate({"image": image,"prompt": prompt})answer = answer[0]prompt = prompt + " " + answer + "."return prompt, answerquestion = "What's in the picture?"prompt, answer = get_answer(prompt, question, model)print(f"Question: {question}")print(f"Answer: {answer}")
Question: What's in the picture?Answer: a snake
question = "What kind of snake?"prompt, answer = get_answer(prompt, question, model)print(f"Question: {question}")print(f"Answer: {answer}")
Question: What kind of snake?Answer: cobra
question = "Is it poisonous?"prompt, answer = get_answer(prompt, question, model)print(f"Question: {question}")print(f"Answer: {answer}")
Question: Is it poisonous?Answer: yes
question = "If it bites me, can I die?"prompt, answer = get_answer(prompt, question, model)print(f"Question: {question}")print(f"Answer: {answer}")
Question: If it bites me, can I die?Answer: yes