SWE-bench logoSWE-bench
SWE-bench logoSWE-bench
  • Leaderboards
  • Benchmarks
  • SWE-bench
  • SWE-bench Verified
  • SWE-bench Bash Only
  • SWE-bench Multilingual
  • SWE-bench Multimodal
  • SWE-bench Lite
  • About
  • Paper
  • Docs
  • Blog
  • Contact
  • Citations
  • Press
  • Submit
  • SWE-bench Family
  • SWE-agent iconSWE-agent
  • mini-SWE-agent iconmini-SWE-agent
  • SWE-smith iconSWE-smith
  • CodeClash iconCodeClash
  • SWE-ReX iconSWE-ReX
  • SWE-bench CLI iconSWE-bench CLI
     

SWE-bench LogoOfficial Leaderboards

`; } else if (random

There's an all-new, challenging SWE-bench Multimodal, containing software issues described with images.
Learn more here.

`; } else if (random

mini-SWE-agent scores up to 74% on SWE-bench Verified in 100 lines of Python code.
Click here to learn more.

`; } else { announcementHTML = `

Introducing CodeClash, our new evaluation where LMs compete head to head to write the best codebase!
Click here to learn more.

`; } // Insert the selected announcement into the DOM document.getElementById('random-announcement').innerHTML = announcementHTML;
New!
Filters:

Compare results

Select models via the checkboxes, then click Compare results.

Compare models

No models selected. Use quick select or go back to select models in the first column.

Alternatively, use the quick select:

SWE-bench Bash Only uses the SWE-bench Verified dataset with the mini-SWE-agent environment for all models [Post].
SWE-bench Lite is a subset curated for less costly evaluation [Post].
SWE-bench Verified is a human-filtered subset [Post].
SWE-bench Multimodal features issues with visual elements [Post].

Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full, 500 Verified & Bash Only, 300 Lite, 517 Multimodal).

Analyze Results in Detail

News

  • [11/2025] CodeClash iconIntroducing CodeClash, our new eval of LMs as goal (not task) oriented developers! [Link]
  • [07/2025] mini-SWE-agent iconmini-SWE-agent scores 65% on SWE-bench Verified in 100 lines of python code. [Link]
  • [05/2025] SWE-smith iconSWE-smith is out! Train your own models for software engineering agents. [Link]
  • [03/2025] SWE-agent iconSWE-agent 1.0 is the open source SOTA on SWE-bench Lite! [Link]
  • [10/2024] Introducing SWE-bench Multimodal! [Link]
  • [08/2024] SWE-bench x OpenAI = SWE-bench Verified [Report]
  • [06/2024] Docker-ized SWE-bench for easier evaluation [Report]
  • [03/2024] Check out SWE-agent (12.47% on SWE-bench) [Link]
  • [03/2024] Released SWE-bench Lite [Report]

Acknowledgements

We thank the following institutions for their generous support: Open Philanthropy, AWS, Modal, Andreessen Horowitz, OpenAI, and Anthropic.

© 2025 SWE-bench Team. All rights reserved.
GitHub HuggingFace Paper