Generate videos with Wan2.1-T2V-14B and Inference Providers

06 of march of 2025

Hugging Face Inference Providers

Disclaimer: This post has been translated to English using a machine translation model. Please, let me know if you find any mistakes.

It is clear that the largest hub of AI models is Hugging Face. And now they are offering the possibility to perform inference on some of their models using serverless GPU providers.

One of those models is Wan-AI/Wan2.1-T2V-14B, which as of writing this post is the best open-source video generation model, as can be seen in the Artificial Analysis Video Generation Arena Leaderboard

If we look at its modelcard we can see on the right a button that says Replicate. Wan2.1-T2V-14B modelcard

Inference providers

If we go to the Inference providers settings page, we will see something like this Where we can press the button with a key to enter the API KEY of the provider we want to use, or leave selected the path with two dots. If we choose the first option, it will be the provider who charges us for the inference, while in the second option, it will be Hugging Face who charges us for the inference. So do what suits you best.

Inference with Replicate

In my case, I obtained an API KEY from Replicate and added it to a file called .env, which is where I will store the API KEYS and which you should not upload to GitHub, GitLab, or your project repository. The .env must have this format

HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS="hf_aL...AY"
REPLICATE_API_KEY="r8_Sh...UD"```

Where `HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS` is a token you need to obtain from [Hugging Face](https://huggingface.co/settings/tokens) and `REPLICATE_API_KEY` is the API KEY of Replicate which you can obtain from [Replicate](https://replicate.com/account/api-tokens).

Reading the API Keys

The first thing we need to do is read the API KEYS from the .env file

	
		import os
import dotenv
dotenv.load_dotenv()
REPLICATE_API_KEY = os.getenv("REPLICATE_API_KEY")
HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS = os.getenv("HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS")

Logging in the Hugging Face hub

To be able to use the Wan-AI/Wan2.1-T2V-14B model, as it is on the Hugging Face hub, we need to log in.

	
		from huggingface_hub import login
login(HUGGINGFACE_TOKEN_INFERENCE_PROVIDERS)

Inference Client

Now we create an inference client, we have to specify the provider, the API KEY and in this case, additionally, we are going to set a timeout of 1000 seconds, because by default it is 60 seconds and the model takes quite a while to generate the video.

	
		from huggingface_hub import InferenceClient
client = InferenceClient(
	provider="replicate",
	api_key=REPLICATE_API_KEY,
	timeout=1000
)

Video Generation

We already have everything to generate our video. We use the text_to_video method of the client, pass it the prompt, and tell it which model from the hub we want to use; if not, it will use the default one.

	
		video = client.text_to_video(
	"Funky dancer, dancing in a rehearsal room. She wears long hair that moves to the rhythm of her dance.",
	model="Wan-AI/Wan2.1-T2V-14B",
)

Saving the video

Finally, we save the video, which is of type bytes, to a file on our disk.

	
		output_path = "output_video.mp4"
with open(output_path, "wb") as f:
    f.write(video)
print(f"Video saved to: {output_path}")

	
		Video saved to: output_video.mp4

Generated video

This is the video generated by the model

Continue reading

Agents patterns

Are your agents falling short? Elevate your AI projects with advanced patterns: ReAct, planning, multi-agents, and more. Practical guide with code!

LangGraph: Revolutionize your AI agents

🚀 Revolutionize your AI agents! 🧠 LangGraph is not just another library, it's the orchestration framework that gives you total control to build complex agents, with long-term memory and even human intervention! Say goodbye to basic chatbots, it's time to create true intelligence. Dive into this post and discover it!

Create virtual environments with uv

Learn how to create virtual environments with uv, a package manager and environment for Python written in Rust, which makes it very fast. If you have had problems with the waiting times using conda, or want a faster and easier alternative to venv, enter and see how to use uv.

Last posts -->

Have you seen these projects?

Horeca chatbot

Naviground

Subtify

View all projects -->

Do you want to apply AI in your project? Contact me!

Do you want to improve with these tips?

Memory profiler

See the memory usage of a script

DataLoader with pin_memory and num_workers

Increase DataLoader performance with pin_memory and num_workers

py-smi

Python library to get GPU data like `nvidia-smi`

Last tips -->

Use this locally

Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.