Skip to content

Terminal-Bench

Terminal-Bench is a benchmark for evaluating AI agents' ability to use a terminal.

Description

It focuses on tasks that require interacting with a real terminal environment, such as installing software, debugging system issues, and managing files.

Alternatives

Backlog

  • Evaluate "Terminus 2" on new benchmark tasks.