Skip to content

Humanity's Last Exam (HLE)

HLE is a new benchmark designed to test the limits of LLMs on the most difficult human-level tasks.

Description

It consists of highly complex, multi-disciplinary questions that require deep expertise and advanced reasoning to solve. It is intended to be a benchmark that remains challenging even as models continue to improve.

Alternatives

Backlog

  • Track performance of upcoming models (o1, Claude 3.5 Opus).