____ _ _ _ _
/ ___|___ _ __ ___ _ __ _ _| |_ __ _| |_(_) ___ _ __ __ _| |
| | / _ \| '_ ` _ \| '_ \| | | | __/ _` | __| |/ _ \| '_ \ / _` | |
| |__| (_) | | | | | | |_) | |_| | || (_| | |_| | (_) | | | | (_| | |
\____\___/|_| |_| |_| .__/ \__,_|\__\__,_|\__|_|\___/|_| |_|\__,_|_|
|_|
____ _ _
| _ \| |__ _ _ ___(_) ___ ___
| |_) | '_ \| | | / __| |/ __/ __|
| __/| | | | |_| \__ \ | (__\__ \
|_| |_| |_|\__, |___/_|\___|___/
|___/ >
Press Enter to view the leaderboard
| # | Model | Agent Harness | Runner | Overall | Progress | Total Tokens | Latency | Cost | Score / $ | Status |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gpt-5.4 | Codex CLI | Harbor | 77% | 933,458 | 531s wall | $0.7109 | 108.3 | PARTIAL | |
| 2 | Qwen3.5-397B-A17B | Aider | Harbor | 76% | ~98,900 | 482s wall | ~$0.1270 | ~598.6 | PARTIAL | |
| 3 | gpt-5.5 | Codex CLI | Harbor | 75.8% | 694,603 | 350s wall | $1.0889 | 69.6 | PARTIAL | |
| 4 | deepseek-v4-pro | OpenCode | Harbor | 75% | 947,427 | 202s wall | $1.6794 | 44.7 | PARTIAL | |
| 5 | Gemini 3.5 Flash | OpenCode | Harbor | 73.8% | 1,264,424 | 228s wall | $0.0532 | 1387.4 | PARTIAL |
Capability Radar
Multi-dimensional fingerprint of model strengths across the evaluation rubric.