What are the key points?

NAVER AI Lab introduces Seoul World Model for grounded real-world city simulations. SWM uses retrieval-augmented generation over millions of street-view images to ensure spatial accuracy. New Virtual Lookahead Sink technique enables stable video generation across multi-kilometer urban trajectories.

NAVER AI Lab Unveils SWM City-Scale Simulation Model

•NAVER AI Lab introduces Seoul World Model for grounded real-world city simulations.
•SWM uses retrieval-augmented generation over millions of street-view images to ensure spatial accuracy.
•New Virtual Lookahead Sink technique enables stable video generation across multi-kilometer urban trajectories.

Imagine a world model that doesn't just hallucinate a dreamlike city, but accurately renders the streets of Seoul as they actually exist. Researchers at NAVER AI Lab have developed the Seoul World Model (SWM), a city-scale simulation grounded in real-world data rather than purely synthetic imagination. While traditional world models often struggle with consistency over long distances, SWM anchors its video generation by pulling from a massive database of real street-view images.

To bridge the gap between static street photos and dynamic video, the team implemented a view interpolation pipeline that creates smooth training videos from sparse captures. They also introduced 'cross-temporal pairing' to handle the misalignment between retrieved reference images and the specific lighting or traffic of the target scene. This allows the model to maintain visual fidelity even when the source data is years old or taken at a different time of day.

One of the most impressive breakthroughs is the Virtual Lookahead Sink. This mechanism stabilizes long-horizon generation by continuously re-grounding the AI to a future location’s image, preventing the 'drift' or distortion that usually plagues AI-generated videos. By testing across Seoul and Busan, the researchers proved that SWM generates spatially faithful videos spanning hundreds of meters, paving the way for realistic autonomous vehicle training.

Imagine a world model that doesn't just hallucinate a dreamlike city, but accurately renders the streets of Seoul as they actually exist. Researchers at NAVER AI Lab have developed the Seoul World Model (SWM), a city-scale simulation grounded in real-world data rather than purely synthetic imagination. While traditional world models often struggle with consistency over long distances, SWM anchors its video generation by pulling from a massive database of real street-view images.

To bridge the gap between static street photos and dynamic video, the team implemented a view interpolation pipeline that creates smooth training videos from sparse captures. They also introduced 'cross-temporal pairing' to handle the misalignment between retrieved reference images and the specific lighting or traffic of the target scene. This allows the model to maintain visual fidelity even when the source data is years old or taken at a different time of day.

One of the most impressive breakthroughs is the Virtual Lookahead Sink. This mechanism stabilizes long-horizon generation by continuously re-grounding the AI to a future location’s image, preventing the 'drift' or distortion that usually plagues AI-generated videos. By testing across Seoul and Busan, the researchers proved that SWM generates spatially faithful videos spanning hundreds of meters, paving the way for realistic autonomous vehicle training.

NAVER AI Lab Unveils SWM City-Scale Simulation Model

Tags