What are the key points?

Anthropic open-sources Bloom, an agentic framework for measuring specific AI behavioral traits and misalignments Four-stage pipeline automates scenario generation, interaction rollouts, and behavior scoring across 16 frontier models Tool quantifies frequency and severity of behaviors like sycophancy and sabotage through reproducible seed configurations

Anthropic Releases Bloom: An Open Source Framework for Automated AI Evaluation

•Anthropic open-sources Bloom, an agentic framework for measuring specific AI behavioral traits and misalignments
•Four-stage pipeline automates scenario generation, interaction rollouts, and behavior scoring across 16 frontier models
•Tool quantifies frequency and severity of behaviors like sycophancy and sabotage through reproducible seed configurations

Anthropic has released Bloom, an open-source agentic framework designed to automate the process of evaluating AI model behavior. Unlike traditional audits that might look for broad issues, Bloom focuses on measuring specific, researcher-defined traits. It uses a structured four-stage pipeline—Understanding, Ideation, Rollout, and Judgment—to generate diverse scenarios and quantify how often a model exhibits a target behavior. This allows researchers to skip manual engineering and move directly to measuring complex propensities across different systems. The system works by using an AI agent to first understand a behavior description, then ideate specific test scenarios, and finally conduct rollouts where the model interacts with a simulated environment. A judge model then scores these interactions to determine the presence and severity of the behavior. Anthropic researchers used Bloom to benchmark 16 frontier models (LLMs) on traits like delusional sycophancy—telling the user what they want to hear regardless of truth—and long-horizon sabotage. Bloom is meant to complement Petri, another recently released tool. While Petri acts as an auditor to discover new types of misalignment, Bloom is built for precision and measurement. It helps researchers understand if a model is becoming more or less aligned as its capabilities grow. By making this tool open source, Anthropic aims to provide the research community with a standardized way to track emergent properties—unexpected behaviors that arise as models scale—and ensure AI safety throughout the development lifecycle. The framework is highly configurable via seed files, which act like DNA for the evaluation process. These files specify the behavior, examples, and models used at each stage. This reproducibility ensures that metrics can be compared fairly across different teams and timeframes. The tool is currently available on Gi

Anthropic Releases Bloom: An Open Source Framework for Automated AI Evaluation

Tags