HuggingFace TGI: Rust HTTP server + scheduler for production LLM inference

Source repo Strength strong

HuggingFace Text Generation Inference uses Rust for the HTTP server and scheduling layers, Python for model execution. 10,811 GitHub stars. Powers HuggingChat and Inference API in production. TGI v3.0 claims 13x faster than vLLM on long prompts. Demonstrates the "Python for models, Rust for orchestration" pattern.

URL https://github.com/huggingface/text-generation-inference

Published January 1, 2024

Added March 21, 2026