OmniLottie Generates Vector Animations via Multi-Modal Instructions
- •OmniLottie framework generates high-quality Lottie vector animations from text, image, and video instructions.
- •Specialized Lottie tokenizer converts complex JSON structures into manageable sequences for vision-language models.
- •Researchers release MMLottie-2M dataset featuring two million professionally designed, richly annotated vector animations.
Lottie has become the industry standard for lightweight, scalable vector animations in web and mobile applications, but its underlying JSON structure is notoriously difficult for AI to generate directly. The sheer volume of metadata and formatting tokens often drowns out the actual animation logic. To solve this, researchers introduced OmniLottie, a framework designed to streamline this process by treating animation parameters as learnable tokens.
At the heart of OmniLottie is a specialized tokenizer that strips away structural redundancy, transforming raw JSON files into structured sequences of commands. This allows the system to bridge the gap between visual intent and code execution. By building upon pretrained vision-language models, the framework can interpret interleaved instructions—such as a text prompt combined with a reference image—to produce semantically aligned motion that feels fluid and professional.
To support this new frontier in generative media, the team curated MMLottie-2M, a massive dataset of two million animations. This provides the first large-scale foundation for training models that understand the nuances of vector motion. The project effectively moves beyond simple static image generation, offering designers a powerful tool for creating functional, resolution-independent UI elements and character animations through natural language instructions.