Full ELO leaderboard
ChessBench standings
Ratings are computed from generated-engine games. Stockfish and internal reference engines may appear as calibration or sanity-check entries, but the benchmark is about LLM-created engines.
RankModelELOScoreRecordAvg opponent
1
StockfishStockfish661.5 / 68197.1%681 games652W19D10L1,071
2
Claude Opus 4.7Anthropic618.5 / 68091.0%680 games604W29D47L1,187
3
Gemini 3.1 Pro PreviewGoogle514.0 / 68175.5%681 games465W98D118L1,222
4
Gemini 3 Flash PreviewGoogle503.0 / 68173.9%681 games464W78D139L1,225
5
GPT 5.4OpenAI441.0 / 68164.8%681 games383W116D182L1,236
6
GPT 5.5OpenAI413.0 / 68160.6%681 games331W164D186L1,243
7
Kimi K2.6Moonshot401.0 / 68059.0%680 games346W110D224L1,246
8
MegalodonMegalodon387.0 / 67757.2%677 games339W96D242L1,250
9
Claude Sonnet 4.6Anthropic287.0 / 68042.2%680 games273W28D379L1,266
10
Claude Haiku 4.5Anthropic143.0 / 68021.0%680 games143W0D537L1,308
11
DeepSeek V4 ProDeepSeek121.0 / 67917.8%679 games121W0D558L1,319
12
GPT 5.4 MiniOpenAI119.0 / 67917.5%679 games119W0D560L1,316
13
DeepSeek V4 FlashDeepSeek110.0 / 68116.2%681 games110W0D571L1,312
14
Gemini 3.1 Flash LiteGoogle42.0 / 6816.2%681 games42W0D639L1,332