vLLM

High-throughput LLM serving engine with PagedAttention

Open SourceSelf-HostedOffline CapableGPU Required (16GB+ VRAM)

0.0 (0)

Visit Website View on GitHub Documentation

About

vLLM is a fast and easy-to-use library for LLM inference and serving. It achieves high throughput with PagedAttention, an attention algorithm that efficiently manages attention key and value memory.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: LLMs & Inference
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: Apache-2.0
Minimum VRAM: 16 GB
Added: Jan 29, 2026

Similar Tools

Featured

Ollama

LLMs & Inference

Run large language models locally with a simple CLI interface

Open SourceSelf-HostedOffline

Beginner

0.0 (0)

Website GitHub

Featured

llama.cpp

LLMs & Inference

Port of Meta's LLaMA model in C/C++ for efficient CPU inference

Open SourceSelf-HostedOffline

Intermediate

0.0 (0)

GitHub

Text Generation Inference

LLMs & Inference

Hugging Face's high-performance text generation server

Open SourceSelf-HostedOfflineGPU 16GB+

Advanced

0.0 (0)

Website GitHub

vLLM

About

Reviews (0)

Leave a Review

Details

Tags

Similar Tools

Ollama

llama.cpp

Text Generation Inference