Veo 3.1 Ingredients to Video: More consistency, creativity and control
- •Google DeepMind launches Veo 3.1 with enhanced Image-to-Image capabilities and character consistency
- •New native vertical output support and 4K upscaling for mobile-first video production
- •SynthID watermarking expanded to help users verify AI-generated videos in the Gemini app
- •**Multimodal AI: **Artificial intelligence systems that can understand, interpret, and generate information across multiple formats such as text, images, and video.
- •**Image-to-Image: **A process in generative AI where an input image is used as a reference or starting point to create a new, modified, or animated visual output.
Google DeepMind has released Veo 3.1, an advanced Multimodal AI designed to give creators more control over generative video. A core addition is the 'Ingredients to Video' feature, which utilizes Image-to-Image techniques to transform still photos into dynamic clips. Ricky Wong (Lead Product Manager at Google DeepMind) explained that the model now supports native vertical outputs for mobile platforms like YouTube Shorts, while also providing state-of-the-art upscaling to 1080p and 4K resolution for professional-grade fidelity. A major technical hurdle addressed in this update is identity consistency. This allows characters, objects, and backgrounds to remain stable across different scenes, preventing the common AI issue where visual elements shift appearance between frames. These capabilities are being integrated into the Gemini app and various developer tools, making the technology accessible for both casual users and studio professionals. Google also emphasized transparency, noting that all generated content is marked with SynthID to help users verify AI-generated media. Beyond these video improvements, Google continues to push the boundaries of technical research. The company recently highlighted a Quantum Computing milestone, underscoring its commitment to foundational hardware advancements that eventually power sophisticated models like Veo. For now, the focus remains on empowering filmmakers and social media creators with tools that blend disparate textures and characters into cohesive, high-impact storytelling.