Whisper

Whisper Whisper

Whisperlink image 4

Introductionlink image 5

This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of supervised multilingual and multitask data collected from the web. The use of such a large and diverse data set leads to greater robustness to accents, background noise and technical language. In addition, it allows for transcription in multiple languages, as well as translation from those languages into English.

Wesite

Paper

GitHub

Model card

Installationlink image 6

In order to install this tool, it is best to create a new anaconda environment.

	
!conda create -n whisper
Copy

We enter the environment

	
!conda create -n whisper
!conda activate whisper
Copy

We install all the necessary packages

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
Copy

Finally we install whisper.

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
Copy

And we update ffmpeg.

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
Copy

Uselink image 7

We import whisper.

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
Copy

We select the model, the bigger the better it will do it

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
Copy

We uploaded the audio of this old (1987) Micro Machines advert

	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
Copy
	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
Copy
	
!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
Copy
	
Detected language: en
	
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
Copy
	
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
result.text
Copy
	
"This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets. There's a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take any place. And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move. Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge. And these play sets fit together to form a micro machine world. Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all. Micro machines and micro machine pocket play sets sold separately from Galoob. The smaller they are, the better they are."

Continue reading

DoLa – Decoding by Contrasting Layers Improves Factuality in Large Language Models

DoLa – Decoding by Contrasting Layers Improves Factuality in Large Language Models

Have you ever talked to an LLM and they answered you something that sounds like they've been drinking machine coffee all night long 😂 That's what we call a hallucination in the LLM world! But don't worry, because it's not that your language model is crazy (although it can sometimes seem that way 🤪). The truth is that LLMs can be a bit... creative when it comes to generating text. But thanks to DoLa, a method that uses contrast layers to improve the feasibility of LLMs, we can keep our language models from turning into science fiction writers 😂. In this post, I'll explain how DoLa works and show you a code example so you can better understand how to make your LLMs more reliable and less prone to making up stories. Let's save our LLMs from insanity and make them more useful! 🚀

Last posts -->

Have you seen these projects?

Subtify

Subtify Subtify

Subtitle generator for videos in the language you want. Also, it puts a different color subtitle to each person

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.

View all containers -->

Do you want to apply AI in your project? Contact me!

Do you want to train your model with these datasets?

short-jokes-dataset

Dataset with jokes in English

opus100

Dataset with translations from English to Spanish

netflix_titles

Dataset with Netflix movies and series

View more datasets -->