start here
Best first models for llama.cpp
gpt-oss-20b
34B class • 15.5 GB minimum
Community GGUF packaging gives llama.cpp a direct path.
OLMo 3.1 Instruct 32B
34B class • 19.5 GB minimum
Community GGUF import path for llama.cpp.
best model
Best for people who care about low-level control, serving flags, and GGUF tuning. This page ranks cleaner starting points first, then links you into the model pages when you need exact memory and hardware guidance.
benchmark first
It asks for more setup fluency than Ollama or LM Studio before you get to the first answer. The benchmark is still the fastest way to confirm whether this machine belongs in the size band that makes llama.cpp feel worth using.
start here
34B class • 15.5 GB minimum
Community GGUF packaging gives llama.cpp a direct path.
34B class • 19.5 GB minimum
Community GGUF import path for llama.cpp.
why this runtime
broader catalog
3B class • 2.5 GB minimum
IBM's smallest Granite 4.0 instruct release is a pragmatic US-origin starter for local chat, extraction, and agent scaffolding.
7B class • 5.0 GB minimum
Ai2's 7B instruct release is the clearest Apache-licensed American alternative to Llama when you want a smaller fully open local model.
7B class • 6.5 GB minimum
Meta's 8B instruct release remains the safest broad-compatibility US local model when you want maximum runtime coverage.
13B class • 8.5 GB minimum
Phi-4-reasoning is the clearest text-first American recommendation around the 13B class when you care about reasoning quality more than multimodal extras.
13B class • 11.0 GB minimum
Gemma 3 12B stays interesting when you want a smaller multimodal American model, but it is less turnkey than Phi-4-reasoning for plain text work.
34B class • 19.5 GB minimum
Granite 4.0 H-Small is a credible American midrange choice for RAG-heavy work, but it is more specialized than the general-purpose winners above it.
related pages
Model page
Search intent: gpt-oss-20b can i run it
34B class start • 15.5 GB minimum
OpenAI
Model page
Search intent: olmo 3.1 instruct 32b can i run it
34B class start • 19.5 GB minimum
Ai2
runtime smoke
Each row installs or updates the tracked runtime, downloads the starter model, and proves one local inference with the pinned prompt bundle.
These rows use hosted CPU runners so stale guidance is visible before the public install copy drifts too far from reality.
llama.cpp
Last verified: Not yet verified
Tested runtime version: Not yet verified
Monthly smoke cadence (31-day review window)
Prompt bundle: 2026.03-reference-lm-prompts-v1
GitHub-hosted Ubuntu x64 CPU runner
Install recipe: Install the latest llama.cpp prebuilt CPU release for Ubuntu before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: No successful monthly smoke run recorded yet.
GitHub-hosted macOS CPU runner
Install recipe: Install the latest llama.cpp prebuilt binary release for macOS before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: No successful monthly smoke run recorded yet.
GitHub-hosted Windows x64 CPU runner
Install recipe: Install the latest llama.cpp prebuilt CPU release for Windows before each run.
Last verified: Not yet verified
Tested version: Not yet verified
Model pull: Granite 4.0 Micro GGUF
Stale: No successful monthly smoke run recorded yet.