Humanity's Last Exam (HLE)¶
HLE is a new benchmark designed to test the limits of LLMs on the most difficult human-level tasks.
Description¶
It consists of highly complex, multi-disciplinary questions that require deep expertise and advanced reasoning to solve. It is intended to be a benchmark that remains challenging even as models continue to improve.
Links¶
Alternatives¶
Backlog¶
- Track performance of upcoming models (o1, Claude 3.5 Opus).