SWE-bench¶

SWE-bench is a benchmark for evaluating LLMs on real-world software engineering tasks.

Description¶

It uses actual issues from GitHub and requires the model to generate a functional patch that passes existing tests.