Whisper

19 of march of 2023

Whisper

Introduction

This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of supervised multilingual and multitask data collected from the web. The use of such a large and diverse data set leads to greater robustness to accents, background noise and technical language. In addition, it allows for transcription in multiple languages, as well as translation from those languages into English.

Installation

In order to install this tool, it is best to create a new anaconda environment.

	
		!conda create -n whisper

We enter the environment

	
		!conda create -n whisper
!conda activate whisper

We install all the necessary packages

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

Finally we install whisper.

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git

And we update ffmpeg.

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg

Use

We import whisper.

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper

We select the model, the bigger the better it will do it

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
      # model = "base"
      # model = "small"
      # model = "medium"
      model = "large"
      model = whisper.load_model(model)

We uploaded the audio of this old (1987) Micro Machines advert

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
      # model = "base"
      # model = "small"
      # model = "medium"
      model = "large"
      model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
      audio = whisper.load_audio(audio_path)
      audio = whisper.pad_or_trim(audio)

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
      # model = "base"
      # model = "small"
      # model = "medium"
      model = "large"
      model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
      audio = whisper.load_audio(audio_path)
      audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)

	
		!conda create -n whisper
!conda activate whisper
!conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
import whisper
# model = "tiny"
# model = "base"
# model = "small"
# model = "medium"
model = "large"
model = whisper.load_model(model)
audio_path = "MicroMachines.mp3"
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

	
		Detected language: en

	
		options = whisper.DecodingOptions()
      result = whisper.decode(model, mel, options)

	
		options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
result.text

	
		"This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets. There's a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take any place. And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move. Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge. And these play sets fit together to form a micro machine world. Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all. Micro machines and micro machine pocket play sets sold separately from Galoob. The smaller they are, the better they are."

Continue reading

Agents patterns

Are your agents falling short? Elevate your AI projects with advanced patterns: ReAct, planning, multi-agents, and more. Practical guide with code!

LangGraph: Revolutionize your AI agents

🚀 Revolutionize your AI agents! 🧠 LangGraph is not just another library, it's the orchestration framework that gives you total control to build complex agents, with long-term memory and even human intervention! Say goodbye to basic chatbots, it's time to create true intelligence. Dive into this post and discover it!

Create virtual environments with uv

Learn how to create virtual environments with uv, a package manager and environment for Python written in Rust, which makes it very fast. If you have had problems with the waiting times using conda, or want a faster and easier alternative to venv, enter and see how to use uv.

Last posts -->

Have you seen these projects?

Horeca chatbot

Naviground

Subtify

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Memory profiler

See the memory usage of a script

DataLoader with pin_memory and num_workers

Increase DataLoader performance with pin_memory and num_workers

py-smi

Python library to get GPU data like `nvidia-smi`

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.