start here
best model
Best local models for llama.cpp
Best for people who care about low-level control, serving flags, and GGUF tuning. This page ranks cleaner starting points first, then links you into the model pages when you need exact memory and hardware guidance.
benchmark first
Benchmark before you commit to a llama.cpp download.
It asks for more setup fluency than Ollama or LM Studio before you get to the first answer. The benchmark is still the fastest way to confirm whether this machine belongs in the size band that makes llama.cpp feel worth using.
why this runtime
What you are choosing with llama.cpp
- Best for: People who want the most control over quantization, serving shape, and local inference knobs.
- Tradeoff: It asks for more setup fluency than Ollama or LM Studio before you get to the first answer.
- Benchmark flow: Use the benchmark first when the question is about your machine, then use this page to choose the cleanest first pull inside llama.cpp.
broader catalog
More tracked models for llama.cpp
related pages
Model pages to open next
No search-intent pages are ready yet.
runtime smoke
Monthly runtime smoke matrix
Each row installs or updates the tracked runtime, downloads the starter model, and proves one local inference with the pinned prompt bundle.
These rows use hosted CPU runners so stale guidance is visible before the public install copy drifts too far from reality.
llama.cpp
Runtime guidance currently needs review
Last verified: Not yet verified
Tested runtime version: Not yet verified
Monthly smoke cadence (31-day review window)
Prompt bundle: 2026.03-reference-lm-prompts-v1
Linux
GitHub-hosted Ubuntu x64 CPU runner
Install recipe: Install the latest llama.cpp prebuilt CPU release for Ubuntu before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: Latest smoke run failed during artifact collection.
macOS
GitHub-hosted macOS CPU runner
Install recipe: Install the latest llama.cpp prebuilt binary release for macOS before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: Latest smoke run failed during local inference.
Windows
GitHub-hosted Windows x64 CPU runner
Install recipe: Install the latest llama.cpp prebuilt CPU release for Windows before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: Latest smoke run failed during artifact collection.