Visual analytics

Charts

Interactive views of benchmark results: reasoning levels, score spread by reasoning mode, per-game strength, and economics when Game 09+ is in the benchmark. Figures below share the same model selection.

Which models appear

Filter, toggle checkboxes, or use Top 12 / All / Clear. The same selection updates every chart.

Reasoning levels

Applies to Scores, Economics (when available), and Per game below.

Overall min–max and mean; small dots are each game (averaged across reasoning levels where scores exist)
Coming soon

Estimated cost versus score will come with benchmarks of Game 09 or later.

Strength heatmap (per-game scores averaged across reasoning levels)