What are the key points?

Alibaba releases GUI-Owl-1.5, a multi-platform agent capable of controlling desktop, mobile, and web interfaces. Models range from 2B to 235B parameters, setting new records on OSWorld and AndroidWorld benchmarks. New MRPO reinforcement learning algorithm optimizes agent performance across complex, long-horizon multi-platform tasks.

Alibaba's GUI-Owl-1.5 Automates Desktop and Mobile Interfaces

•Alibaba releases GUI-Owl-1.5, a multi-platform agent capable of controlling desktop, mobile, and web interfaces.
•Models range from 2B to 235B parameters, setting new records on OSWorld and AndroidWorld benchmarks.
•New MRPO reinforcement learning algorithm optimizes agent performance across complex, long-horizon multi-platform tasks.

Alibaba’s Tongyi Lab has unveiled GUI-Owl-1.5, a versatile Agentic AI designed to navigate and operate digital interfaces just like a human user. By supporting diverse platforms including desktop, mobile, and web browsers, the model enables seamless "cloud-edge" collaboration where tasks can be handed off between devices in real-time.

The suite includes various sizes—from a nimble 2B version for local execution to a massive 235B parameter powerhouse—allowing it to dominate over 20 GUI benchmarks. It achieves a score of 56.5 on OSWorld and 71.6 on AndroidWorld, showcasing a significant leap in its ability to understand screen layouts and execute multi-step commands (grounding and automation).

To reach this level of precision, researchers developed a "Hybrid Data Flywheel," which combines simulated environments with cloud-based sandboxes to generate high-quality training data. They also introduced a novel reinforcement learning algorithm called MRPO. This multimodal technique specifically addresses the friction caused by platform-switching and the difficulty of maintaining focus during long-horizon tasks.

By open-sourcing these models, Alibaba provides a robust foundation for developers to build sophisticated AI assistants that can manage everything from booking travel across multiple apps to troubleshooting technical software issues.

Alibaba’s Tongyi Lab has unveiled GUI-Owl-1.5, a versatile Agentic AI designed to navigate and operate digital interfaces just like a human user. By supporting diverse platforms including desktop, mobile, and web browsers, the model enables seamless "cloud-edge" collaboration where tasks can be handed off between devices in real-time.

The suite includes various sizes—from a nimble 2B version for local execution to a massive 235B parameter powerhouse—allowing it to dominate over 20 GUI benchmarks. It achieves a score of 56.5 on OSWorld and 71.6 on AndroidWorld, showcasing a significant leap in its ability to understand screen layouts and execute multi-step commands (grounding and automation).

To reach this level of precision, researchers developed a "Hybrid Data Flywheel," which combines simulated environments with cloud-based sandboxes to generate high-quality training data. They also introduced a novel reinforcement learning algorithm called MRPO. This multimodal technique specifically addresses the friction caused by platform-switching and the difficulty of maintaining focus during long-horizon tasks.

By open-sourcing these models, Alibaba provides a robust foundation for developers to build sophisticated AI assistants that can manage everything from booking travel across multiple apps to troubleshooting technical software issues.

Alibaba's GUI-Owl-1.5 Automates Desktop and Mobile Interfaces

Tags