Terminal-Bench¶
Terminal-Bench is a benchmark for evaluating AI agents' ability to use a terminal.
Description¶
It focuses on tasks that require interacting with a real terminal environment, such as installing software, debugging system issues, and managing files.
Links¶
Alternatives¶
Backlog¶
- Evaluate "Terminus 2" on new benchmark tasks.