What are the key points?

Nebius launches SWE-rebench V2, featuring over 32,000 executable software engineering tasks. The new language-agnostic pipeline spans 3,600+ GitHub repositories across 20 programming languages. An automated validation system utilizes AI judges to ensure environment stability and filter unreliable data.

Nebius Unveils Massive Dataset to Advance Autonomous AI Engineering

•Nebius launches SWE-rebench V2, featuring over 32,000 executable software engineering tasks.
•The new language-agnostic pipeline spans 3,600+ GitHub repositories across 20 programming languages.
•An automated validation system utilizes AI judges to ensure environment stability and filter unreliable data.

The development of autonomous AI agents capable of software engineering has long faced a significant bottleneck due to the scarcity of diverse, high-quality training data. While reinforcement learning—where models learn through trial and error—has driven recent growth, researchers have struggled to secure enough reproducible coding challenges across various programming languages. Addressing this gap, Nebius has unveiled SWE-rebench V2, a language-neutral pipeline designed to collect and verify real-world software engineering tasks at an unprecedented scale.

The release is significant for its vast scope and high level of automation. Moving beyond the traditional focus on resource-rich languages like Python, the dataset covers 20 different programming languages across more than 3,600 repositories. The research team employed interactive setup agents to synthesize installation procedures and established a panel of AI judges to filter out unreliable data, successfully building a dataset of over 32,000 executable tasks. This provides a robust foundation for AI models to practice fixing bugs and implementing features in realistic environments.

In addition to the core executable data, the release includes 120,000 supplementary tasks extracted from pull request descriptions. These come with metadata to identify common pitfalls, such as overly restrictive tests that can mislead learning models. By open-sourcing these results and the execution code, the researchers aim to democratize the training of sophisticated software agents and enable them to generalize their problem-solving capabilities across the global software ecosystem.

The development of autonomous AI agents capable of software engineering has long faced a significant bottleneck due to the scarcity of diverse, high-quality training data. While reinforcement learning—where models learn through trial and error—has driven recent growth, researchers have struggled to secure enough reproducible coding challenges across various programming languages. Addressing this gap, Nebius has unveiled SWE-rebench V2, a language-neutral pipeline designed to collect and verify real-world software engineering tasks at an unprecedented scale.

The release is significant for its vast scope and high level of automation. Moving beyond the traditional focus on resource-rich languages like Python, the dataset covers 20 different programming languages across more than 3,600 repositories. The research team employed interactive setup agents to synthesize installation procedures and established a panel of AI judges to filter out unreliable data, successfully building a dataset of over 32,000 executable tasks. This provides a robust foundation for AI models to practice fixing bugs and implementing features in realistic environments.

In addition to the core executable data, the release includes 120,000 supplementary tasks extracted from pull request descriptions. These come with metadata to identify common pitfalls, such as overly restrictive tests that can mislead learning models. By open-sourcing these results and the execution code, the researchers aim to democratize the training of sophisticated software agents and enable them to generalize their problem-solving capabilities across the global software ecosystem.

Nebius Unveils Massive Dataset to Advance Autonomous AI Engineering

Tags