Skip to content

MBPP (Mostly Basic Python Problems)

MBPP is a benchmark designed to evaluate the code generation performance of LLMs on basic Python tasks.

Description

The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry-level programmers. Each problem includes a task description, a code solution, and three automated test cases.

Key Metrics

  • Pass@1: Accuracy on the first attempt.
  • Pass@k: Success rate with multiple samples.

Alternatives

Backlog

  • Include sanitized version (MBPP-sanitized) details.