MBPP (Mostly Basic Python Problems)¶
MBPP is a benchmark designed to evaluate the code generation performance of LLMs on basic Python tasks.
Description¶
The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry-level programmers. Each problem includes a task description, a code solution, and three automated test cases.
Key Metrics¶
- Pass@1: Accuracy on the first attempt.
- Pass@k: Success rate with multiple samples.
Links¶
Alternatives¶
Backlog¶
- Include sanitized version (MBPP-sanitized) details.