StepFun Releases Step 3.5 Flash with 11B Active Parameters
- •StepFun launches Step 3.5 Flash, an 11B active parameter model rivaling frontier-level intelligence.
- •Sparse Mixture-of-Experts architecture achieves top scores in math and coding benchmarks like MathArena.
- •Optimized Multi-Token Prediction and attention mechanisms drastically reduce latency and cost for AI agents.
StepFun has unveiled Step 3.5 Flash, a model designed to balance high-level reasoning with the speed required for real-world applications. By utilizing a sparse Mixture-of-Experts (MoE) architecture—activating only 11 billion of its 196 billion parameters for any given task—it achieves frontier-level performance while maintaining computational efficiency.
The model introduces structural optimizations to enhance its role as an agentic tool. It employs a 3:1 ratio of sliding-window to full attention, allowing it to process long-range context without high memory overhead. Additionally, Multi-Token Prediction (MTP-3) enables the model to predict several words at once, speeding up the generation process and lowering costs for complex, multi-turn interactions.
To reach high reasoning capabilities, researchers implemented a scalable reinforcement learning framework. This system combines verifiable signals, like correct math answers, with preference feedback to foster self-improvement. Step 3.5 Flash currently holds the top spot on MathArena and competes directly with industry leaders like Gemini 3.0 Pro and GPT-5.2 xHigh in coding and math benchmarks.