GSM8K (Grade School Math 8K)¶
GSM8K is a benchmark for evaluating the multi-step mathematical reasoning capabilities of LLMs.
Description¶
It contains 8.5K high-quality grade school math word problems. These problems require 2 to 8 steps of basic arithmetic to solve.
Key Metrics¶
- Exact Match (EM): Accuracy of the final numerical answer.
Links¶
Alternatives¶
Backlog¶
- Add info on Chain-of-Thought (CoT) prompting impact on GSM8K.