FastRTC: The Real-Time Communication Library for Python
Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.
In recent months, we have seen significant advancements in real-time voice models, with entire companies being founded around both open-source and closed models. Some key milestones include:
OpenAI
andGoogle
launched their live multimodal APIs for ChatGPT and Gemini. OpenAI even launched a phone number1-800-ChatGPT
!Kyutai
launched Moshi, a fully open-source audio-to-audio LLM.Alibaba
launched Qwen2-Audio, an open-source LLM that natively understands audio.Fixie.ai
launched Ultravox, another open-source LLM that also natively understands audio.ElevenLabs
raised 180 million dollars in its Series C.
Despite this explosion in models and funding, it remains difficult to build real-time AI applications that stream audio and video, especially in Python.
- Machine learning engineers may not have experience with the necessary technologies to build real-time applications, such as
WebRTC
. - Even code assistance tools like
Cursor
andCopilot
struggle to write Python code that supports real-time audio/video applications.
That's why the announcement of FastRTC
, the real-time communication library for Python, is exciting. The library is designed to make it easy to build real-time audio and video AI applications entirely in Python!
Main Features of FastRTC
- 🗣️ Automatic voice detection and built-in turn taking, so you only have to worry about the user response logic.
- 💻 Automatic UI - Built-in Gradio UI enabled for WebRTC for testing (or deployment to production!).
- 📞 Phone call - Use
fastphone()
to get a free phone number to call your audio stream (HF token required). - ⚡️ Support for
WebRTC
andWebsocket
. - 💪 Customizable - You can mount the stream in any
FastAPI
application to serve a custom UI and deploy beyondGradio
. - 🧰 Many utilities for
text-to-speech
,speech-to-text
,stop detection
to help you get started.
Installation
To be able to use FastRTC
, you first need to install the library:
pip install fastrtc
But if we want to install the pause detection, speech-to-text, and text-to-speech functionalities, we need to install some additional dependencies:
pip install "fastrtc[vad, stt, tts]"
Getting Started
We will start by building the hello world
of real-time audio: echoing what the user says. In FastRTC
, this is as simple as:
from fastrtc import Stream, ReplyOnPauseimport numpy as npdef echo(audio: tuple[int, np.ndarray]) -> tuple[int, np.ndarray]:yield audiostream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")stream.ui.launch()
* Running on local URL: http://127.0.0.1:7872To create a public link, set `share=True` in `launch()`.
When we go to the link that Gradio suggests, we first have to give permissions to the browser to access the microphone. Next, this will appear:
If we click on the tab to the right of the word Record
, we can select the microphone we want to use.
When we press the Record
button, everything we say will be repeated by the application. That is, it captures the audio, detects when we have stopped speaking, and repeats it.
Let's break it down:
ReplyOnPause
will handle voice detection and turn-taking for you. You only need to worry about the logic for responding to the user. You have to pass it the function that will manage the input audio. In our case, it's theecho
function, which captures the input audio and returns it as a stream usingyield
, which many people don't know, but is a generator, meaning it's a Python method for creating iterators. If you want to learn more aboutyield
, you can read my post on Python. Any generator that returns an audio tuple (represented as(sample_rate, audio_data)
) will work.- The
Stream
class will build a Gradio UI for you to quickly test your stream. Once you have finished prototyping, you can deploy your Stream as a production-ready FastAPI application in a single line of code.
Here we can see an example from the creators of FastRTC
Leveling Up: Voice Chat with LLM
The next level is to use an LLM to respond to the user. FastRTC
comes with built-in speech-to-text
and text-to-speech
capabilities, so working with LLMs is really easy. Let's modify our echo
function accordingly:
from fastrtc import ReplyOnPause, Stream, get_stt_model, get_tts_modelfrom gradio_client import Clientclient = Client("Maximofn/SmolLM2_localModel")stt_model = get_stt_model()tts_model = get_tts_model()def echo(audio):prompt = stt_model.stt(audio)response = client.predict(message=prompt,system_message="You are a friendly Chatbot. Always reply in the language in which the user is writing to you.",max_tokens=512,temperature=0.7,top_p=0.95,api_name="/chat")prompt = responsefor audio_chunk in tts_model.stream_tts_sync(prompt):yield audio_chunkstream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")stream.ui.launch()
Loaded as API: https://maximofn-smollm2-localmodel.hf.space ✔* Running on local URL: http://127.0.0.1:7871To create a public link, set `share=True` in `launch()`.
As a speech-to-text
model, use Moonshine
, which supposedly only supports English, but I have tested it in Spanish and it understands well.
As a language model, we will use the model I deployed in a backend on Hugging Face and wrote about in the post Deploying a Backend with LLM on HuggingFace. It uses the LLM HuggingFaceTB/SmolLM2-1.7B-Instruct
, which is a small model since it's running on a backend with CPU, but it works quite well.
As a text-to-speech
model, use Kokoro
, which does have options to speak in other languages, but is not yet implemented in the FastRTC
library.
If we are very interested in using speech-to-speech
and text-to-speech
models in other languages, we could implement them ourselves, because the greatest potential of FastRTC
lies in the real-time communication layer, but I won't go into that now.
Now if we test the code we just wrote, we can have a voice chatbot in real time.
Phone Call
We generated a script because it doesn't always work in a Jupyter Notebook. %%writefile fastrtc_phone_demo.pyfrom fastrtc import ReplyOnPause, Stream, get_stt_model, get_tts_modelimport gradiofrom gradio_client import Clientimport osfrom gradio.networking import setup_tunnel as original_setup_tunnelimport socket# Monkey patch setup_tunnel para que acepte el parámetro adicionaldef patched_setup_tunnel(host, port, share_token, share_server_address, share_server_tls_certificate=None):return original_setup_tunnel(host, port, share_token, share_server_address, share_server_tls_certificate)# Replace the original function with our patched versiongradio.networking.setup_tunnel = patched_setup_tunnel# Get the token from the environment variableHUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN = os.getenv("HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN")# Initialize the LLM clientllm_client = Client("Maximofn/SmolLM2_localModel")# Initialize the STT and TTS modelsstt_model = get_stt_model()tts_model = get_tts_model()# Define the echo functiondef echo(audio):# Convert the audio to textprompt = stt_model.stt(audio)# Generate the responseresponse = llm_client.predict(message=prompt,system_message="You are a friendly Chatbot. Always reply in the language in which the user is writing to you.",max_tokens=512,temperature=0.7,top_p=0.95,api_name="/chat")# Convert the response to audioprompt = response# Stream the audiofor audio_chunk in tts_model.stream_tts_sync(prompt):yield audio_chunkdef find_free_port(start_port=8000, max_port=9000):"""Find the first free port starting from start_port."""print(f"Searching for a free port starting from {start_port}...")for port in range(start_port, max_port):with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:result = sock.connect_ex(('127.0.0.1', port))if result != 0: # If result != 0, the port is freeprint(f"Free port found: {port}")return portraise RuntimeError(f"No free port found between {start_port} and {max_port}")free_port = find_free_port() # Search for a free portstream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")stream.fastphone(token=HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN, port=free_port)
We explain the code
The part
# Monkey patch setup_tunnel para que acepte el parámetro adicional
def patched_setup_tunnel(host, port, share_token, share_server_address, share_server_tls_certificate=None):
return original_setup_tunnel(host, port, share_token, share_server_address, share_server_tls_certificate)
# Replace the original function with our patched version
gradio.networking.setup_tunnel = patched_setup_tunnel
It is necessary because FastRTC
is written for an older version of gradio
that does not support the share_server_address
parameter in the setup_tunnel
method. So we patch it to accept the additional parameter.
As a Hugging Face token is required, we obtain it from the environment variable HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN
.
# Get the token from the environment variable
HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN = os.getenv("HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN")
The language models, the speech-to-text
model, and the text-to-speech
model are created below, along with the echo
function that will handle the input and output audio.
# Initialize the LLM client
llm_client = Client("Maximofn/SmolLM2_localModel")
# Initialize the STT and TTS models
stt_model = get_stt_model()
tts_model = get_tts_model()
# Define the echo function
def echo(audio):
# Convert the audio to text
prompt = stt_model.stt(audio)
# Generate the response
response = llm_client.predict(
message=prompt,
system_message="You are a friendly Chatbot. Always reply in the language in which the user is writing to you.",
max_tokens=512,
temperature=0.7,
top_p=0.95,
api_name="/chat"
)
# Convert the response to audio
prompt = response
# Stream the audio
for audio_chunk in tts_model.stream_tts_sync(prompt):
yield audio_chunk
As before we have used the port 8000
, if it says it is occupied, we create a function to find a free port and we find one.
def find_free_port(start_port=8000, max_port=9000):
"""Find the first free port starting from start_port."""
print(f"Searching for a free port starting from {start_port}...")
for port in range(start_port, max_port):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
result = sock.connect_ex(('127.0.0.1', port))
if result != 0: # If result != 0, the port is free
print(f"Free port found: {port}")
return port
raise RuntimeError(f"No free port found between {start_port} and {max_port}")
free_port = find_free_port() # Search for a free port
We create the stream and now we use stream.fastphone()
to get a free phone number to call your stream, instead of stream.ui.launch()
that we used before to create the graphical interface.
stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
stream.fastphone(token=HUGGINGFACE_FASTRTC_PHONE_CALL_TOKEN, port=free_port)
If we run it, we will see something like this:
!python fastrtc_phone_demo.py
Loaded as API: https://maximofn-smollm2-localmodel.hf.space ✔INFO: Warming up STT model.INFO: STT model warmed up.INFO: Warming up VAD model.INFO: VAD model warmed up.Searching for a free port starting from 8000...Free port found: 8004INFO: Started server process [24029]INFO: Waiting for application startup.INFO: Visit https://fastrtc.org/userguide/api/ for WebRTC or Websocket API docs.INFO: Application startup complete.INFO: Uvicorn running on http://127.0.0.1:8004 (Press CTRL+C to quit)INFO: Your FastPhone is now live! Call +1 877-713-4471 and use code 994514 to connect to your stream.INFO: You have 30:00 minutes remaining in your quota (Resetting on 2025-04-07)INFO: Visit https://fastrtc.org/userguide/audio/#telephone-integration for information on making your handler compatible with phone usage.
We see that it appears
INFO: Your FastPhone is now live! Call +1 877-713-4471 and use code 994514 to connect to your stream.
INFO: You have 30:00 minutes remaining in your quota (Resetting on 2025-04-07)
If we go to Telephone Integration in the FastRTC
documentation, we will see that it uses twilio to make the call. It has an option to configure a local number from the United States, Dublin, Frankfurt, Tokyo, Singapore, Sydney, and São Paulo.
I tried making the call from Spain (which is going to be quite expensive for me) and it works, but it's slow. I called, entered the code, and waited for the agent to connect, but since it was taking too long, I hung up.