Arena Releases Full Historical AI Leaderboard Dataset
- •Arena releases full three-year historical dataset of AI model leaderboards on Hugging Face
- •Data covers ten arenas including text, vision, and video across hundreds of models
- •Release enables longitudinal analysis of model performance and open-source versus proprietary trends
The Arena team has officially opened its archives, releasing a comprehensive dataset encompassing three years of AI benchmarking history across ten distinct arenas. This public-access repository, hosted on Hugging Face, provides a granular look at how hundreds of models have evolved since May 2023. By moving beyond static snapshots, researchers can now track the "march of progress" as top-tier model scores climbed from roughly 1,000 to nearly 1,500 points in less than three years.
The dataset is meticulously organized into subsets covering various modalities, such as text, vision, and video generation. It utilizes splits to separate the most recent rankings from the complete historical record. This structure allows for sophisticated longitudinal analysis—studies that observe how the same variables change over long periods. For example, users can now visualize the sudden explosion in model variety or compare the adoption rates of open-source versus proprietary licenses across different domains like coding or image editing.
Beyond simple rankings, the release includes style-controlled variants for several arenas. This ensures that models are not rewarded simply for having a more polite or well-formatted output style, but are instead judged on the actual quality of their reasoning and accuracy. This commitment to open science provides the broader community with the empirical tools needed to scrutinize AI development trends and evaluate the maturation of fine-tuning infrastructure globally.