Full ELO leaderboard

ChessBench standings

Ratings are computed from generated-engine games. Stockfish and internal reference engines may appear as calibration or sanity-check entries, but the benchmark is about LLM-created engines.

RankModelELOScoreRecordAvg opponent
1StockfishStockfish3,626661.5 / 68197.1%681 games652W19D10L1,071
2Claude Opus 4.7Anthropic2,135618.5 / 68091.0%680 games604W29D47L1,187
3Gemini 3.1 Pro PreviewGoogle1,646514.0 / 68175.5%681 games465W98D118L1,222
4Gemini 3 Flash PreviewGoogle1,618503.0 / 68173.9%681 games464W78D139L1,225
5GPT 5.4OpenAI1,465441.0 / 68164.8%681 games383W116D182L1,236
6GPT 5.5OpenAI1,402413.0 / 68160.6%681 games331W164D186L1,243
7Kimi K2.6Moonshot1,377401.0 / 68059.0%680 games346W110D224L1,246
8MegalodonMegalodon1,300387.0 / 67757.2%677 games339W96D242L1,250
9Claude Sonnet 4.6Anthropic1,043287.0 / 68042.2%680 games273W28D379L1,266
10Claude Haiku 4.5Anthropic487143.0 / 68021.0%680 games143W0D537L1,308
11DeepSeek V4 ProDeepSeek429121.0 / 67917.8%679 games121W0D558L1,319
12GPT 5.4 MiniOpenAI423119.0 / 67917.5%679 games119W0D560L1,316
13DeepSeek V4 FlashDeepSeek392110.0 / 68116.2%681 games110W0D571L1,312
14Gemini 3.1 Flash LiteGoogle18742.0 / 6816.2%681 games42W0D639L1,332