# SWE-bench

**Source:** https://promtable.com/glossary/swe-bench

> SWE-bench is the standard benchmark for autonomous coding agents — real GitHub issues from popular Python repos paired with the actual fix commit; the agent must produce a patch that passes the hidden test suite.

---
SWE-bench is the standard benchmark for autonomous coding agents — real GitHub issues from popular Python repos paired with the actual fix commit; the agent must produce a patch that passes the hidden test suite.

SWE-bench was introduced in 2023 (Princeton) and quickly became the SWE agent leaderboard. Each task: an issue from a real OSS repo, repository state at the time, hidden test suite. The agent gets the issue + repo, must produce a diff that makes the hidden tests pass. Variants: SWE-bench Lite (filtered for solvable), SWE-bench Verified (human-vetted), SWE-bench Multilingual. As of 2026 top scores cluster around 60-70% on Verified; humans hit ~95%. Production caveat: SWE-bench performance is necessary but not sufficient — agents that score high can still flop on real-world tickets due to dependency setup, multi-repo context, or ambiguous specs.

## When to use

- Comparing autonomous coding agents.
- Tracking progress over time.

## Common mistakes

- Reading SWE-bench scores as production capability — real tickets are harder.

## Related terms

- [autonomous-coder](https://promtable.com/glossary/autonomous-coder)
- [evals](https://promtable.com/glossary/evals)
- [agent-tracing](https://promtable.com/glossary/agent-tracing)

## Sources

- [SWE-bench leaderboard](https://www.swebench.com/)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/swe-bench
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/swe-bench".
Contact: info@vibecodingturkey.com.