What are the key points?

StepFun launches Step 3.5 Flash, an 11B active parameter model rivaling frontier-level intelligence. Sparse Mixture-of-Experts architecture achieves top scores in math and coding benchmarks like MathArena. Optimized Multi-Token Prediction and attention mechanisms drastically reduce latency and cost for AI agents.

StepFun Releases Step 3.5 Flash with 11B Active Parameters

•StepFun launches Step 3.5 Flash, an 11B active parameter model rivaling frontier-level intelligence.
•Sparse Mixture-of-Experts architecture achieves top scores in math and coding benchmarks like MathArena.
•Optimized Multi-Token Prediction and attention mechanisms drastically reduce latency and cost for AI agents.

StepFun has unveiled Step 3.5 Flash, a model designed to balance high-level reasoning with the speed required for real-world applications. By utilizing a sparse Mixture-of-Experts (MoE) architecture—activating only 11 billion of its 196 billion parameters for any given task—it achieves frontier-level performance while maintaining computational efficiency.

The model introduces structural optimizations to enhance its role as an agentic tool. It employs a 3:1 ratio of sliding-window to full attention, allowing it to process long-range context without high memory overhead. Additionally, Multi-Token Prediction (MTP-3) enables the model to predict several words at once, speeding up the generation process and lowering costs for complex, multi-turn interactions.

To reach high reasoning capabilities, researchers implemented a scalable reinforcement learning framework. This system combines verifiable signals, like correct math answers, with preference feedback to foster self-improvement. Step 3.5 Flash currently holds the top spot on MathArena and competes directly with industry leaders like Gemini 3.0 Pro and GPT-5.2 xHigh in coding and math benchmarks.

StepFun has unveiled Step 3.5 Flash, a model designed to balance high-level reasoning with the speed required for real-world applications. By utilizing a sparse Mixture-of-Experts (MoE) architecture—activating only 11 billion of its 196 billion parameters for any given task—it achieves frontier-level performance while maintaining computational efficiency.

The model introduces structural optimizations to enhance its role as an agentic tool. It employs a 3:1 ratio of sliding-window to full attention, allowing it to process long-range context without high memory overhead. Additionally, Multi-Token Prediction (MTP-3) enables the model to predict several words at once, speeding up the generation process and lowering costs for complex, multi-turn interactions.

To reach high reasoning capabilities, researchers implemented a scalable reinforcement learning framework. This system combines verifiable signals, like correct math answers, with preference feedback to foster self-improvement. Step 3.5 Flash currently holds the top spot on MathArena and competes directly with industry leaders like Gemini 3.0 Pro and GPT-5.2 xHigh in coding and math benchmarks.

StepFun Releases Step 3.5 Flash with 11B Active Parameters

Tags