Cloudflare Infire: Rust inference engine — 7% faster than vLLM, 82% less CPU

Source blog_post Strength strong
Cloudflare built Infire, an LLM inference engine written in Rust. Results: 7% faster than vLLM 0.10.0, uses only 25% CPU vs vLLM's >140% (82% reduction). Cuts CPU overhead via compiled CUDA graphs. Powers Llama 3.1 8B on Cloudflare edge. Real production evidence that Rust is viable for AI infrastructure.