UC Berkeley Launches Unified Diffusion Language Model Framework
- •UC Berkeley introduces dLLM framework to unify diffusion language model training, inference, and evaluation.
- •New Fast-dLLM integration delivers 2-4x speedup through block-wise caching and parallel decoding techniques.
- •Framework successfully converts ModernBERT and Qwen3 models into diffusion-based chat systems with minimal compute.
While autoregressive models—which predict text one word at a time from left to right—dominate the current landscape, diffusion language models (DLMs) offer a promising alternative by refining entire sequences of text simultaneously. However, the development of these models has been slowed by fragmented codebases and inconsistent evaluation methods. Researchers at UC Berkeley have addressed this by releasing dLLM, a comprehensive open-source framework designed to standardize the core components of diffusion-based modeling for both large and small-scale applications.
The dLLM framework introduces a plug-and-play inference system that separates the model's architecture from the sampling algorithm used to generate text. This flexibility allows users to implement Fast-dLLM, a technique that achieves significant speed improvements by predicting multiple tokens in parallel and reusing previously calculated data (KV-cache). Unlike traditional models that decode in a strict linear order, dLLM includes a visualizer to show how tokens evolve across the entire sequence during the generation process, providing a unique window into how these models create coherent language.
In a practical demonstration of the framework's versatility, the team converted existing encoder-only models and standard autoregressive systems into diffusion-based chatbots. Notably, their ModernBERT-large-chat variant outperformed several popular small models on reasoning benchmarks despite its non-traditional architecture. By providing reproducible recipes and pre-trained checkpoints, the dLLM project lowers the barrier for researchers to explore diffusion as a viable path toward more efficient and flexible language generation.