LLMs & Inference AI Tools
Large language models and inference engines for running AI models locally or in the cloud
Large language models and inference engines for running AI models locally or in the cloud
Run large language models locally with a simple CLI interface
Port of Meta's LLaMA model in C/C++ for efficient CPU inference
High-throughput LLM serving engine with PagedAttention
Unified API to call 100+ LLM providers with OpenAI format
Hugging Face's high-performance text generation server