MBPP (Mostly Basic Python Problems)¶

MBPP is a benchmark designed to evaluate the code generation performance of LLMs on basic Python tasks.

Description¶

The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry-level programmers. Each problem includes a task description, a code solution, and three automated test cases.

Key Metrics¶

Pass@1: Accuracy on the first attempt.
Pass@k: Success rate with multiple samples.

Links¶

GitHub Repository

Alternatives¶

Backlog¶

Include sanitized version (MBPP-sanitized) details.