Alibaba's GUI-Owl-1.5 Automates Desktop and Mobile Interfaces
- •Alibaba releases GUI-Owl-1.5, a multi-platform agent capable of controlling desktop, mobile, and web interfaces.
- •Models range from 2B to 235B parameters, setting new records on OSWorld and AndroidWorld benchmarks.
- •New MRPO reinforcement learning algorithm optimizes agent performance across complex, long-horizon multi-platform tasks.
Alibaba’s Tongyi Lab has unveiled GUI-Owl-1.5, a versatile Agentic AI designed to navigate and operate digital interfaces just like a human user. By supporting diverse platforms including desktop, mobile, and web browsers, the model enables seamless "cloud-edge" collaboration where tasks can be handed off between devices in real-time.
The suite includes various sizes—from a nimble 2B version for local execution to a massive 235B parameter powerhouse—allowing it to dominate over 20 GUI benchmarks. It achieves a score of 56.5 on OSWorld and 71.6 on AndroidWorld, showcasing a significant leap in its ability to understand screen layouts and execute multi-step commands (grounding and automation).
To reach this level of precision, researchers developed a "Hybrid Data Flywheel," which combines simulated environments with cloud-based sandboxes to generate high-quality training data. They also introduced a novel reinforcement learning algorithm called MRPO. This multimodal technique specifically addresses the friction caused by platform-switching and the difficulty of maintaining focus during long-horizon tasks.
By open-sourcing these models, Alibaba provides a robust foundation for developers to build sophisticated AI assistants that can manage everything from booking travel across multiple apps to troubleshooting technical software issues.