GameBench 2 — Current standings

Reasoning Variants

Name: DuelLab GameBench 2 public release gb2-20260713-37697e440dc9
Creator: DuelLab
License: https://creativecommons.org/licenses/by/4.0/

GameBench 2 is DuelLab's continuously updated benchmark for AI-generated game-playing programs.

Updated 2026-07-13

Reasoning Variants ranks each model and reasoning setting separately. The score starts with the mean over usable plugins, then applies the quarter power of plugin success rate. Variants need at least four usable plugins and more than 50% success for an official rank; lower-coverage rows remain visible as provisional.

Explore charts for larger comparisons and additional benchmark views.

Unique models: 34
Reasoning variants: 80
Public games: 8
Rated matches: 56,163

Rankings

Current leaderboard score across public games (0–100)

Showing top 24 of 80 benchmarked reasoning variants (updates when chart loads)

Scale: relative 0-100

73.7
Claude Fable 5
72.4
GPT-5.6 Sol
69.8
Claude Fable 5
68.6
Claude Opus 4.8
67.6
GPT-5.5
66.7
GPT-5.4
55.0
GPT-5.6 Luna
53.9
GPT-5.6 Sol
51.8
GPT-5.6 Terra
50.3
Gemini 3.5 Flash
49.3
GPT-5.4
49.1
GPT-5.4
47.6
GPT-5.4 Mini
47.2
GPT-5.5
45.9
Claude Opus 4.5
45.6
GPT-5.4 Nano
44.4
GPT 5
43.9
Claude Sonnet 5
43.3
Claude Opus 4.8
42.7
O3
42.4
Claude Opus 4.5
42.3
DeepSeek V4 Pro
41.5
Claude Opus 4.8
40.5
MiMo-V2.5-Pro

View

Reasoning

Model

Game

Reasoning variants

Detailed leaderboard

Scroll sideways to see every column.

Reasoning Variants leaderboard for DuelLab Benchmark
Rank	Base	Model	Reasoning	Score	Playable	Best	Std. price	Codegen
1	1	Claude Fable 5	XHigh	73.7	73.7	100	$1.58Standard list-price estimate	63%37%0%n 8
2	2	GPT-5.6 Sol	XHigh	72.4	72.4	100 × 2	$10.06Standard list-price estimate	100%0%0%n 8
3	3	Claude Fable 5	Medium	69.8	72.2	86.6	$0.50Standard list-price estimate	88%0%12%n 8
4	4	Claude Opus 4.8	XHigh	68.6	70.9	90.5	$0.93Standard list-price estimate	75%13%12%n 8
5	5	GPT-5.5	XHigh	67.6	69.9	99.7	$2.74Standard list-price estimate	88%0%12%n 8
6	6	GPT-5.4	XHigh	66.7	66.7	100 × 3	$2.64Standard list-price estimate	88%12%0%n 8
7	7	GPT-5.6 Luna	XHigh	55.0	56.6	86.2	$3.16Standard list-price estimate	78%11%11%n 9
8	8	GPT-5.6 Sol	Medium	53.9	55.5	80.8	$0.47Standard list-price estimate	78%11%11%n 9
9	9	GPT-5.6 Terra	XHigh	51.8	51.8	79.7	$8.17Standard list-price estimate	63%37%0%n 8
10	10	Gemini 3.5 Flash	Medium	50.3	50.3	92.4	$0.35Provider-reported	0%100%0%n 8
11	11	GPT-5.4	None	49.3	49.3	93.9	$0.09Standard list-price estimate	75%25%0%n 8
12	12	GPT-5.4	Medium	49.1	49.1	100	$0.50Standard list-price estimate	88%12%0%n 8
13	13	GPT-5.4 Mini	XHigh	47.6	47.6	83.6	$1.02Standard list-price estimate	83%17%0%n 12
14	14	GPT-5.5	Medium	47.2	47.2	85.9	$0.44Standard list-price estimate	88%12%0%n 8
15	15	Claude Opus 4.5	None	45.9	45.9	78.9	$0.12Standard list-price estimate	100%0%0%n 8
16	16	GPT-5.4 Nano	XHigh	45.6	47.0	95.1	$0.29Standard list-price estimate	59%29%12%n 17
17	17	GPT 5	XHigh	44.4	44.4	82.3	$0.28Standard list-price estimate	75%25%0%n 8
18	18	Claude Sonnet 5	XHigh	43.9	43.9	70.3	$0.37Standard list-price estimate	100%0%0%n 8
19	19	Claude Opus 4.8	None	43.3	44.8	80.9	$0.16Standard list-price estimate	88%0%12%n 8
20	20	O3	Medium	42.7	45.9	71.3	$0.05Standard list-price estimate	75%0%25%n 8
21	21	Claude Opus 4.5	XHigh	42.4	43.8	71.3	$0.27Standard list-price estimate	75%13%12%n 8
22	22	DeepSeek V4 Pro	XHigh	42.3	45.4	85.7	$0.08Provider-reported	13%62%25%n 8
23	23	Claude Opus 4.8	Medium	41.5	41.5	91.7	$0.22Standard list-price estimate	100%0%0%n 8
24	24	MiMo-V2.5-Pro	Medium	40.5	43.5	70.9	$0.13Provider-reported	0%75%25%n 8
25	25	GPT 5	Medium	40.2	41.5	68.1	$0.18Standard list-price estimate	75%13%12%n 8
26	26	Qwen3.7 Plus	Medium	39.6	42.5	81.9	$0.05Provider-reported	13%62%25%n 8
27	27	Kimi K2.7 Code	Medium	39.2	42.1	59.6	$0.16Provider-reported	0%75%25%n 8
28	28	Claude Sonnet 5	None	38.9	38.9	62.7	$0.06Standard list-price estimate	100%0%0%n 8
29	29	Gemini 3.5 Flash	XHigh	38.8	41.7	88.6	$0.49Provider-reported	0%75%25%n 8
30	30	Minimax M3	None	38.4	41.3	60.3	$0.03Provider-reported	50%25%25%n 8
31	31	GPT-5.6 Terra	Medium	37.8	37.8	66.3	$0.17Standard list-price estimate	63%37%0%n 8
32	32	DeepSeek V4 Pro	None	37.4	38.6	75.9	$0.02Provider-reported	50%38%12%n 8
33	33	Claude Opus 4.5	Medium	37.2	37.2	59.5	$0.24Standard list-price estimate	100%0%0%n 8
34	34	GPT-5.6 Sol	None	37.2	38.3	61.6	$0.12Standard list-price estimate	89%0%11%n 9
35	35	GLM-5.2	XHigh	36.9	40.1	64.9	$0.21Provider-reported	4%68%28%n 25
36	36	GLM-5.2	Medium	36.6	40.5	63.6	$0.12Provider-reported	13%54%33%n 24
37	37	Claude Sonnet 5	Medium	36.5	37.8	64.7	$0.08Standard list-price estimate	75%13%12%n 8
38	38	GPT 4.1	None	36.5	36.5	58.4	$0.06Standard list-price estimate	63%37%0%n 8
39	39	MiMo-V2.5	None	36.5	39.7	44.5	$0.0029Provider-reported	50%21%29%n 24
40	40	GLM-5.2	None	36.3	37.6	49.1	$0.02Mixed: Provider-reported + Standard list-price estimate	50%38%12%n 32
41	41	Qwen3.7 Max	Medium	36.1	37.3	79.2	$0.09Provider-reported	13%75%12%n 8
42	42	Qwen3.7 Max	None	33.7	34.8	68.3	$0.02Provider-reported	88%0%12%n 8
43	43	GPT-5.4 Nano	Medium	33.6	35.7	52.7	$0.02Standard list-price estimate	41%37%22%n 32
44	44	MiMo-V2.5-Pro	None	33.1	34.4	55.0	$0.03Provider-reported	57%29%14%n 7
45	45	GPT-5.6 Terra	None	33.1	34.1	54.9	$0.07Standard list-price estimate	78%11%11%n 9
46	46	GPT-5.4 Mini	Medium	32.7	33.0	45.7	$0.24Standard list-price estimate	92%4%4%n 24
47	47	GPT-5.5	None	31.8	32.9	54.8	$0.14Standard list-price estimate	88%0%12%n 8
48	48	Grok Build 0.1	Medium	31.7	31.7	61.1	$0.06Provider-reported	0%100%0%n 8
49	49	Nemotron 3 Ultra 550B A55B	None	31.4	33.5	64.1	$0.03Provider-reported	56%22%22%n 9
50	50	GPT-5.6 Luna	Medium	30.2	31.1	70.5	$0.04Standard list-price estimate	89%0%11%n 9
51	51	Hy3 Preview	XHigh	28.6	30.4	65.1	$0.0032Provider-reported	56%22%22%n 9
52	52	Hy3 Preview	None	28.6	30.7	78.9	$0.0013Provider-reported	50%25%25%n 8
53	53	Gemma 4 31B	Medium	28.3	32.6	54.5	$0.0066Provider-reported	0%57%43%n 7
54	54	DeepSeek V4 Flash	XHigh	28.0	30.2	63.1	$0.01Provider-reported	33%40%27%n 15
55	55	GPT-OSS 120B	XHigh	27.7	31.1	76.9	$0.0088Provider-reported	0%63%37%n 8
56	56	North Mini Code	Medium	27.5	29.6	46.7	$0.00Provider-reported	38%37%25%n 8
57	57	Nemotron 3 Ultra 550B A55B	Medium	26.9	31.2	65.2	$0.04Provider-reported	0%56%44%n 9
58	58	Step 3.7 Flash	XHigh	25.9	29.1	50.0	$0.06Provider-reported	0%63%37%n 8
59	59	GPT-5.4 Mini	None	25.6	25.6	45.4	$0.02Standard list-price estimate	79%21%0%n 24
60	60	MiMo-V2.5	Medium	24.5	26.7	41.8	$0.02Provider-reported	5%67%28%n 21
61	61	GPT-5.4 Nano	None	23.0	24.5	35.0	$0.0090Standard list-price estimate	31%47%22%n 32
62	62	MiMo-V2.5	XHigh	22.1	24.9	52.8	$0.02Provider-reported	0%63%37%n 8
63	63	Gemini 3.1 Flash Lite	XHigh	21.7	24.4	51.6	$0.03Provider-reported	0%63%37%n 8
64	64	DeepSeek V4 Flash	None	21.6	22.9	34.2	$0.0022Provider-reported	58%21%21%n 24
65	65	Nex N2 Pro	Medium	21.3	24.0	41.1	$0.10Provider-reported	0%63%37%n 8
66	66	DeepSeek V4 Flash	Medium	21.0	21.7	40.6	$0.0032Provider-reported	63%25%12%n 16
67	67	Gemma 4 31B	None	20.8	21.6	38.2	$0.0024Provider-reported	72%14%14%n 7
68	68	Minimax M3	Medium	19.7	20.4	55.6	$0.07Provider-reported	0%88%12%n 8
69	69	Gemini 3.1 Flash Lite	Medium	19.3	20.8	35.4	$0.01Provider-reported	25%50%25%n 8
70	70	GPT-OSS 120B	Medium	19.3	19.9	34.6	$0.0025Provider-reported	13%75%12%n 8
71	71	Nemotron 3 Ultra 550B A55B	XHigh	18.9	21.8	39.5	$0.03Provider-reported	12%44%44%n 16
72	72	Qwen3.7 Plus	None	18.4	20.4	39.8	$0.0087Provider-reported	45%22%33%n 9
73	73	GPT-5.6 Luna	None	11.6	11.6	21.5	$0.02Standard list-price estimate	88%12%0%n 8
74	74	Gemini 3.1 Flash Lite	None	10.1	11.3	26.3	$0.0059Provider-reported	38%25%37%n 8
P	P	O3 Provisional	XHigh	57.0	67.8	75.2	$0.05Standard list-price estimate	50%0%50%n 8
P	P	Nex N2 Pro Provisional	None	31.6	38.9	77.0	$0.16Provider-reported	0%44%56%n 16
P	P	Step 3.7 Flash Provisional	Medium	22.3	26.6	43.2	$0.04Provider-reported	0%50%50%n 8
P	P	North Mini Code Provisional	None	18.0	24.0	25.0	$0.00Provider-reported	19%12%69%n 16
P	P	Mistral Medium 3.5 Provisional	None	15.5	18.5	25.4	$0.04Provider-reported	25%25%50%n 8
P	P	Mistral Medium 3.5 Provisional	XHigh	15.1	18.5	23.9	$0.19Provider-reported	22%22%56%n 9