# Local LLM

**Source:** https://promtable.com/glossary/local-llm

> A local LLM is a language model that runs entirely on the user's own machine — laptop, desktop, or self-hosted server — rather than via a cloud API, trading some quality for privacy, offline access, and zero per-token cost.

---
A local LLM is a language model that runs entirely on the user's own machine — laptop, desktop, or self-hosted server — rather than via a cloud API, trading some quality for privacy, offline access, and zero per-token cost.

By 2026 local LLMs (Llama 4 Maverick variants, Qwen 2.5, Mistral Small / Nemo, DeepSeek-R1-Distill) deliver useful quality on consumer GPUs and Apple Silicon. Runtimes: Ollama (CLI), LM Studio (GUI), llama.cpp (raw), vLLM (production serving). Use cases: privacy-sensitive workflows (legal, medical, internal docs), offline tools, cost-sensitive bulk inference, agents you want to run without API rate limits. Trade-offs: quality lags frontier APIs on hard reasoning, larger models need 24-80 GB GPU, latency depends on hardware.

## When to use

- Privacy-sensitive workflows (legal, medical, internal data).
- Offline or edge deployment.
- Cost-sensitive bulk inference.

## Common mistakes

- Expecting frontier-API quality from 8B-class local models.
- Forgetting that long-context inference needs lots of RAM / VRAM.

## Related terms

- [fine-tuning](https://promtable.com/glossary/fine-tuning)
- [openrouter](https://promtable.com/glossary/openrouter)
- [batched-inference](https://promtable.com/glossary/batched-inference)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/local-llm
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/local-llm".
Contact: info@vibecodingturkey.com.