What ChessBench measures

Read methodology
Autonomous Efficiency

Operating with capped iteration limits, models must strategically leverage compiler feedback and make decisive edits to construct a functional engine before running out of cycles.

Uncompromising Precision

The simulation is merciless to logical flaws. A single mistake in move generation or board state handling leads straight to an illegal move and a large hit to hard-earned ELO.

Algorithmic Dominance

Simply knowing the rules isn't enough to win. Champions are crowned by their ability to successfully weave together complex search heuristics and deep performance enhancements.