What's My AIRuntime fit page built from tracked setup paths and model-fit coverage.

best model

Best local models for llama.cpp

Best for people who care about low-level control, serving flags, and GGUF tuning. This page ranks cleaner starting points first, then links you into the model pages when you need exact memory and hardware guidance.

starter pickBenchmark first
tracked models0
tier spann/a
runtime tradeoffIt asks for more setup fluency than Ollama or LM Studio before you get to the first answer.

benchmark first

Benchmark before you commit to a llama.cpp download.

It asks for more setup fluency than Ollama or LM Studio before you get to the first answer. The benchmark is still the fastest way to confirm whether this machine belongs in the size band that makes llama.cpp feel worth using.

start here

Best first models for llama.cpp

why this runtime

What you are choosing with llama.cpp

  • Best for: People who want the most control over quantization, serving shape, and local inference knobs.
  • Tradeoff: It asks for more setup fluency than Ollama or LM Studio before you get to the first answer.
  • Benchmark flow: Use the benchmark first when the question is about your machine, then use this page to choose the cleanest first pull inside llama.cpp.

broader catalog

More tracked models for llama.cpp

related pages

Model pages to open next

No search-intent pages are ready yet.

runtime smoke

Monthly runtime smoke matrix

Each row installs or updates the tracked runtime, downloads the starter model, and proves one local inference with the pinned prompt bundle.

These rows use hosted CPU runners so stale guidance is visible before the public install copy drifts too far from reality.

llama.cpp

Runtime guidance currently needs review

Last verified: Not yet verified

Tested runtime version: Not yet verified

Monthly smoke cadence (31-day review window)

Prompt bundle: 2026.03-reference-lm-prompts-v1

Linux

GitHub-hosted Ubuntu x64 CPU runner

Install recipe: Install the latest llama.cpp prebuilt CPU release for Ubuntu before each run.

Last verified: Not yet verified

Tested version: Not yet verified

Model pull: Granite 4.0 Micro GGUF

Stale: Latest smoke run failed during artifact collection.

macOS

GitHub-hosted macOS CPU runner

Install recipe: Install the latest llama.cpp prebuilt binary release for macOS before each run.

Last verified: Not yet verified

Tested version: Not yet verified

Model pull: Granite 4.0 Micro GGUF

Stale: Latest smoke run failed during local inference.

Windows

GitHub-hosted Windows x64 CPU runner

Install recipe: Install the latest llama.cpp prebuilt CPU release for Windows before each run.

Last verified: Not yet verified

Tested version: Not yet verified

Model pull: Granite 4.0 Micro GGUF

Stale: Latest smoke run failed during artifact collection.