Leaderboard

Mixed (cross-reasoning)

Each row is one model at one reasoning-effort preset, playing every other variant. Official highest / medium / none boards and the Overall view use only single-effort tracks. Results here do not contribute to the Overall leaderboard.

Model family summary

One row per model and reasoning preset on this mixed track (mean and min–max when multiple runs exist for that pair). Cross-reasoning matches do not roll into Overall.

Track: Mixed (cross-reasoning) Games: 8 Build: Preview
Mixed (cross-reasoning) leaderboard for DuelLab Benchmark
Model Reasoning Avg score Min–Max Entries
Gemini 3.1 Pro PreviewHighest74.149.2 – 100.06
GPT-5.2Highest72.750.2 – 95.32
Qwen3.5 122B A10BMedium69.669.61
Kimi K2.5Medium65.220.5 – 81.67
GPT-5.4Highest64.025.6 – 100.016
Claude Opus 4.6Medium60.419.3 – 83.96
Claude Opus 4.6Highest57.222.8 – 100.014
Gemini 3.1 Pro PreviewNone57.118.8 – 97.44
DeepSeek V3.2Medium57.019.6 – 73.85
GPT-5.4 MiniMedium56.616.1 – 97.45
Claude Sonnet 4.6Medium55.324.9 – 83.86
GPT-5.4Medium55.214.5 – 87.27
GLM-5Medium55.114.0 – 91.87
GPT-5.3 CodexHighest53.816.4 – 89.28
Claude Sonnet 4.6Highest52.220.9 – 73.46
GPT-5.2Medium51.516.7 – 89.68
GPT-5.4 NanoHighest49.63.9 – 100.013
Step 3.5 FlashMedium47.919.1 – 76.72
GLM-5Highest47.318.1 – 84.67
Qwen3 Max ThinkingHighest47.213.5 – 80.92
GPT-5 MiniMedium47.212.4 – 95.48
GPT-5.4None46.813.5 – 100.012
MiMo-V2-ProNone46.54.0 – 85.518
Minimax M2.5Medium45.811.0 – 70.57
GPT-5.3 CodexMedium45.020.6 – 74.28
Gemini 3.1 Pro PreviewMedium44.810.8 – 79.37
GPT-5.4 NanoMedium44.74.5 – 79.08
Kimi K2.5Highest44.618.6 – 75.87
Minimax M2.7Highest44.611.6 – 67.39
Claude Opus 4.6None44.314.6 – 85.014
Minimax M2.5Highest43.86.7 – 83.07
Trinity Large PreviewMedium42.614.8 – 70.42
Claude Sonnet 4.6None42.29.6 – 87.515
GPT-5.2 CodexMedium42.18.8 – 87.53
GPT-5.4 NanoNone41.90.0 – 78.18
Gemini 3 Flash PreviewNone41.87.1 – 100.012
Gemini 3 Flash PreviewMedium41.516.2 – 83.77
DeepSeek V3.2Highest40.918.7 – 71.37
MiMo-V2-ProMedium39.60.6 – 98.815
Gemini 3 Flash PreviewHighest39.319.0 – 85.07
MiMo-V2-OmniNone39.15.1 – 66.011
MiMo-V2-ProHighest38.70.2 – 74.715
MiMo-V2-OmniMedium38.514.9 – 100.07
Nemotron 3 SuperNone38.37.6 – 61.711
Mistral Small 2603Highest38.00.6 – 74.28
Gemini 2.5 FlashNone37.912.3 – 61.07
Step 3.5 FlashHighest37.412.8 – 68.03
GPT-5 MiniHighest35.515.1 – 71.88
Gemini 3.1 Flash Lite PreviewHighest34.912.1 – 62.87
Gemini 2.5 FlashMedium33.916.2 – 79.48
GPT-5.4 MiniHighest33.418.4 – 48.52
Gemini 2.5 FlashHighest33.411.5 – 76.28
Mistral Small 2603Medium33.40.0 – 81.37
Nemotron 3 SuperMedium31.40.0 – 61.97
DeepSeek V3.2None31.36.8 – 84.414
Nemotron 3 SuperHighest30.611.4 – 61.06
Minimax M2.7Medium29.90.6 – 69.18
Qwen3.5 122B A10BHighest29.818.5 – 41.12
Gemini 3.1 Flash Lite PreviewNone28.62.5 – 62.710
Gemini 3.1 Flash Lite PreviewMedium27.811.8 – 44.57
GLM-5None27.27.3 – 51.116
GPT-5.2None26.911.3 – 78.818
Mistral Small 2603None26.70.0 – 76.07
GPT-5.4 MiniNone26.70.0 – 56.29
GPT-5.3 CodexNone26.52.4 – 79.323
MiMo-V2-OmniHighest25.98.2 – 52.27
GPT-5 NanoNone24.60.2 – 76.822
Seed 2.0 MiniMedium24.28.2 – 51.13
GPT-5 NanoMedium23.62.5 – 58.58
GPT-5 MiniNone23.15.0 – 84.319
Kimi K2.5None22.71.5 – 75.715
GPT-5 NanoHighest21.37.2 – 66.98
Qwen3 Max ThinkingNone17.12.5 – 64.310
Qwen3.5 122B A10BNone14.56.4 – 22.210
Seed 2.0 MiniNone13.97.1 – 27.34
Trinity Large PreviewNone13.12.1 – 39.815
GPT-5.2 CodexNone11.80.0 – 16.45
Step 3.5 FlashNone10.91.8 – 23.87
Minimax M2.5None8.00.8 – 13.04
Trinity Large PreviewHighest6.00.0 – 12.02