LM Evaluation Harness¶

A framework for few-shot evaluation of autoregressive language models.

Description¶

It provides a unified interface for evaluating models on hundreds of different tasks, including MMLU, ARC, HellaSwag, and many more.