Peking University Unveils SpatialScore for Improved Image Generation
- •SpatialScore reward model improves how AI interprets complex spatial relationships in generated images.
- •Peking University researchers curated SpatialReward-Dataset featuring over 80,000 human preference pairs.
- •Online reinforcement learning enables image models to outperform proprietary systems in spatial accuracy.
Current text-to-image models are incredibly creative, yet they often stumble when asked to place objects in specific arrangements. If you ask for a "cat to the left of a blue lamp," the AI might swap their positions or ignore the lamp entirely. Achieving the right layout usually requires frustrating trial and error, which limits the professional utility of these creative tools.
To solve this, researchers from Peking University introduced SpatialScore, a specialized reward model that acts like a judge for spatial logic. This system was trained on the new SpatialReward-Dataset, which contains over 80,000 comparison pairs where one image correctly follows spatial instructions while the other fails. By learning these human-vetted preferences, the model develops a sophisticated "sense" of physical space and object interaction.
The real breakthrough lies in how this reward model is used. By employing reinforcement learning—a training process where the AI learns through trial, error, and feedback—the generation model can refine its output in real-time. This method ensures that the final image doesn't just look good, but also places every object exactly where the user requested.
In head-to-head tests, SpatialScore outperformed several leading proprietary models in spatial accuracy. This advancement suggests that future AI tools will move beyond simple aesthetic beauty to master the complex geometry of the physical world, making them far more reliable for professional design, layout planning, and architectural visualization.