# Async inference (Batch API)

**Source:** https://promtable.com/glossary/async-inference

> Async inference (also called Batch API) submits LLM jobs that complete within 24 hours instead of seconds — used for non-interactive workloads at half the per-token price or less.

---
Async inference (also called Batch API) submits LLM jobs that complete within 24 hours instead of seconds — used for non-interactive workloads at half the per-token price or less.

Most major providers in 2026 offer a Batch API (OpenAI, Anthropic, Google) that takes a JSONL file of requests and returns results within 24 hours at 50% or more discount. The infrastructure runs your requests during low-demand windows, packing them into idle GPU time. Use cases: nightly classification of new content, bulk embedding generation, eval runs, offline data labelling, periodic synthesis. Not appropriate for any interactive workload because completion is not guaranteed faster than 24 hours.

## When to use

- Eval suites over thousands of test cases.
- Bulk embedding or classification of new content.
- Offline content moderation passes.

## Common mistakes

- Using async for interactive paths — users will not wait.
- Forgetting that batch jobs have their own rate limits and SLAs.

## Related terms

- [batched-inference](https://promtable.com/glossary/batched-inference)
- [rate-limit](https://promtable.com/glossary/rate-limit)
- [evals](https://promtable.com/glossary/evals)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/async-inference
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/async-inference".
Contact: info@vibecodingturkey.com.