Leaderboard scores (mean relative per-game score, 0–100)

Showing top 24 of 146 benchmarked models (updates when chart loads)

Scale: relative 0-100

  1. 80.0
    Claude Opus 4.7Highest
  2. 79.5
    Deepseek V4 ProHighest
  3. 79.0
    GPT-5.4Highest
  4. 74.2
    Gemini 3.1 Pro PreviewHighest
  5. 70.6
    GPT-5.5Medium
  6. 70.5
    GPT-5.4None
  7. 69.0
    GLM-5.1Highest
  8. 68.6
    Claude Opus 4.7None
  9. 66.8
    Kimi K2.6Highest
  10. 65.8
    GPT-5.4 NanoHighest
  11. 65.0
    Claude Opus 4.6Medium
  12. 63.2
    GPT-5.2Medium
  13. 63.0
    GPT-5.5Highest
  14. 62.9
    GLM-5.1Medium
  15. 62.8
    Claude Opus 4.6None
  16. 62.8
    Kimi K2.6Medium
  17. 62.4
    GPT-5.3 CodexHighest
  18. 61.3
    GPT-5.2Highest
  19. 60.6
    Kimi K2.5Medium
  20. 59.9
    Claude Opus 4.6Highest
  21. 59.1
    Hy3 PreviewHighest
  22. 58.8
    Deepseek V4 ProMedium
  23. 58.5
    Claude Opus 4.7Medium
  24. 58.2
    GPT-5.2None

Model family summary

One row per model and reasoning preset on this mixed track (mean and min–max when multiple runs exist for that pair). Cross-reasoning matches do not roll into Overall.

Track: Mixed (cross-reasoning) Games: 8
Mixed (cross-reasoning) leaderboard for DuelLab Benchmark
Rank Model Reasoning Avg score Min–Max Entries
1Claude Opus 4.7Highest80.048.8 – 99.37
2Deepseek V4 ProHighest79.560.4 – 94.77
3GPT-5.4Highest79.065.7 – 100.016
4Gemini 3.1 Pro PreviewHighest74.251.8 – 95.311
5GPT-5.5Medium70.634.6 – 97.916
6GPT-5.4None70.530.9 – 90.112
7GLM-5.1Highest69.051.4 – 87.37
8Claude Opus 4.7None68.628.9 – 92.624
9Kimi K2.6Highest66.827.7 – 97.68
10GPT-5.4 NanoHighest65.832.5 – 82.120
11Claude Opus 4.6Medium65.029.4 – 89.017
12GPT-5.2Medium63.227.2 – 90.98
13GPT-5.5Highest63.039.7 – 96.716
14GLM-5.1Medium62.934.7 – 81.97
15Claude Opus 4.6None62.823.3 – 81.723
16Kimi K2.6Medium62.80.0 – 100.08
17GPT-5.3 CodexHighest62.434.7 – 87.58
18GPT-5.2Highest61.338.4 – 81.914
19Kimi K2.5Medium60.627.4 – 83.015
20Claude Opus 4.6Highest59.927.8 – 83.821
21Hy3 PreviewHighest59.127.3 – 86.414
22Deepseek V4 ProMedium58.831.0 – 87.27
23Claude Opus 4.7Medium58.528.4 – 82.223
24GPT-5.2None58.226.7 – 76.317
25Claude Sonnet 4.6None57.533.3 – 70.915
26Qwen3.6 PlusMedium57.128.8 – 88.78
27GLM-5Medium56.325.5 – 80.57
28GPT-5.5None56.025.0 – 77.116
29GPT-5.4 NanoMedium55.922.7 – 77.814
30GPT-5.4Medium55.830.4 – 91.37
31Deepseek V4 FlashHighest55.521.3 – 91.98
32Owl AlphaHighest54.924.8 – 93.38
33Gemini 3.1 Pro PreviewMedium54.526.6 – 81.014
34GPT-5.3 CodexMedium53.723.6 – 75.78
35Qwen3.6 Plus PreviewHighest53.434.5 – 85.28
36Qwen3 Max ThinkingNone53.237.4 – 68.910
37MiMo-V2.5-ProMedium52.624.9 – 83.416
38GPT-5.4 MiniMedium52.624.4 – 93.112
39Claude Sonnet 4.6Medium52.535.0 – 77.06
40Claude Sonnet 4.6Highest51.827.6 – 75.96
41GPT-5.2 CodexMedium51.623.5 – 81.012
42MiMo-V2-ProNone51.322.5 – 80.518
43Minimax M2.7Highest51.324.6 – 71.59
44GPT-5 MiniMedium50.625.1 – 77.88
45Ring 2.6 1THighest50.45.6 – 85.86
46Owl AlphaNone50.226.5 – 66.27
47GPT-5.4 MiniHighest50.120.6 – 94.59
48MiMo-V2.5-ProHighest50.023.2 – 72.016
49GPT-5.3 CodexNone50.026.5 – 85.323
50Kimi K2.6None49.825.3 – 85.18
51Gemma 4 31BHighest49.427.1 – 65.721
52MiMo-V2.5-ProNone49.325.5 – 79.816
53Qwen3 Max ThinkingMedium48.922.8 – 60.08
54GPT-5 MiniNone48.716.3 – 92.319
55Step 3.5 FlashHighest48.728.1 – 63.39
56Deepseek V4 FlashNone48.623.4 – 73.78
57Qwen3.6 Max PreviewMedium48.327.0 – 78.47
58Trinity Large PreviewMedium48.330.5 – 66.12
59Deepseek V4 FlashMedium48.21.9 – 79.28
60Kimi K2.5Highest47.922.6 – 67.315
61Ling-2.6-1THighest47.79.9 – 81.37
62MiMo-V2-OmniMedium47.719.5 – 96.77
63GPT-5.2 CodexNone47.642.3 – 53.67
64Gemini 3 Flash PreviewMedium47.421.0 – 73.77
65Qwen3 Max ThinkingHighest47.118.9 – 77.19
66MiMo-V2-ProMedium47.125.0 – 100.015
67DeepSeek V3.2None46.221.1 – 70.214
68Gemini 3 Flash PreviewNone46.212.9 – 88.412
69Mistral Small 2603Medium46.116.4 – 75.17
70Gemma 4 26B A4BMedium46.021.4 – 65.08
71Qwen3.6 FlashNone45.79.5 – 55.86
72Minimax M2.5Medium45.525.0 – 69.37
73Qwen3.6 PlusNone45.23.2 – 78.310
74Deepseek V4 ProNone44.813.4 – 72.28
75DeepSeek V3.2Medium44.723.7 – 69.38
76MiMo-V2.5None44.520.0 – 73.716
77GLM-5.1None44.515.2 – 88.315
78Ling-2.6-1TMedium44.217.3 – 55.68
79GPT-5 MiniHighest44.027.1 – 56.08
80Owl AlphaMedium43.94.2 – 80.97
81Qwen3.6 35B A3BMedium43.90.0 – 78.76
82MiMo-V2-ProHighest43.825.6 – 56.515
83Qwen3.6 FlashMedium43.817.2 – 59.68
84Minimax M2.5Highest43.621.8 – 69.27
85GPT-5.4 NanoNone43.414.2 – 75.914
86Gemma 4 31BMedium43.426.2 – 78.922
87Qwen3.6 35B A3BNone43.422.3 – 63.66
88Hy3 PreviewMedium43.012.8 – 66.516
89MiMo-V2.5Highest43.012.5 – 76.215
90Qwen3.6 Max PreviewHighest43.023.2 – 78.28
91Ring 2.6 1TMedium42.99.3 – 61.17
92Gemini 3.1 Flash Lite PreviewNone42.825.4 – 65.510
93Step 3.5 FlashMedium42.723.5 – 63.08
94Gemini 2.5 FlashMedium42.517.1 – 93.18
95GLM-5None42.423.3 – 72.716
96DeepSeek V3.2Highest42.315.6 – 72.07
97Ling-2.6-1TNone42.217.9 – 61.48
98Nemotron 3 SuperHighest41.816.6 – 68.86
99GLM-5Highest41.812.3 – 85.67
100Kimi K2.5None41.719.1 – 66.923
101Qwen3.5 122B A10BMedium41.612.6 – 65.912
102Gemini 2.5 FlashHighest41.516.2 – 68.58
103MiMo-V2-OmniNone41.522.3 – 63.711
104MiMo-V2.5Medium41.425.7 – 52.916
105Minimax M2.5None41.341.34
106Qwen3.5 122B A10BHighest41.219.1 – 60.110
107Gemini 2.5 FlashNone41.217.1 – 66.87
108Gemini 3 Flash PreviewHighest41.115.9 – 68.87
109Qwen3.6 Max PreviewNone41.04.3 – 77.98
110Nemotron 3 SuperNone40.723.9 – 61.112
111Grok 4.20Highest40.723.2 – 67.816
112Qwen3.6 PlusHighest40.121.6 – 75.58
113Gemma 4 26B A4BHighest39.93.2 – 92.87
114Step 3.5 FlashNone39.739.77
115Qwen3.6 Plus PreviewMedium39.31.1 – 73.68
116Minimax M2.7Medium38.92.9 – 74.08
117Mistral Small 2603Highest38.00.0 – 80.68
118Grok 4.20None37.510.9 – 59.114
119Seed 2.0 MiniMedium37.414.6 – 58.210
120Gemma 4 31BNone37.410.2 – 56.820
121Grok 4.20Medium37.25.4 – 63.816
122Qwen3.6 FlashHighest37.218.7 – 53.88
123Gemini 3.1 Flash Lite PreviewHighest36.923.1 – 53.67
124GPT-5 NanoNone36.721.6 – 61.222
125Ling-2.6-FlashNone36.17.9 – 76.57
126Hy3 PreviewNone35.714.8 – 56.516
127Mistral Small 2603None35.613.9 – 71.97
128Qwen3.6 35B A3BHighest35.56.1 – 79.19
129Ling-2.6-FlashMedium35.428.7 – 47.63
130Trinity Large PreviewNone35.125.1 – 45.115
131Seed 2.0 MiniNone34.521.8 – 63.210
132GPT-5 NanoHighest33.515.1 – 54.48
133Gemini 3.1 Flash Lite PreviewMedium33.014.5 – 52.97
134Nemotron 3 SuperMedium31.93.4 – 69.07
135Nemotron 3 Nano Omni 30B A3B ReasoningHighest31.07.1 – 55.87
136MiMo-V2-OmniHighest29.016.7 – 43.47
137CobuddyMedium28.70.0 – 57.55
138Nemotron 3 Nano Omni 30B A3B ReasoningMedium27.616.1 – 39.12
139Gemma 4 26B A4BNone27.07.3 – 64.97
140CobuddyHighest26.80.0 – 70.77
141Ling-2.6-FlashHighest26.620.6 – 36.25
142GPT-5.4 MiniNone26.40.0 – 58.79
143GPT-5 NanoMedium25.28.1 – 49.98
144Seed 2.0 MiniHighest24.524.51
145Trinity Large PreviewHighest23.412.5 – 34.32
146Qwen3.5 122B A10BNone21.05.0 – 37.010