What are the key points?

Preferred Networks achieves SOTA on WorkArena Level 2 using a GPT-5-based Web Agent. Novel method summarizes 'Web knowledge' from past logs to prevent common browser operation errors. Retry strategy detects loops and switches models, improving baseline performance by 11.5%.

PFN Boosts Web Agent Performance via Knowledge and Retries

•Preferred Networks achieves SOTA on WorkArena Level 2 using a GPT-5-based Web Agent.
•Novel method summarizes 'Web knowledge' from past logs to prevent common browser operation errors.
•Retry strategy detects loops and switches models, improving baseline performance by 11.5%.

Preferred Networks (PFN) has unveiled a new method to dramatically enhance the performance of "Web agents"—AI designed to autonomously navigate browsers and execute complex business tasks.

During a summer internship project, the research team analyzed agent behavior using GPT-5-mini on the WorkArena benchmark, which simulates difficult operations in enterprise applications like ServiceNow.

Their findings show significant accuracy improvements and a new State-of-the-Art (SOTA) record in tasks requiring complex conditional branching, which previously stumped traditional models.

Web agents often struggle with "operational errors," such as misinterpreting the function of a button, or "infinite loops" where the agent repeats the same action indefinitely.

To combat this, the study introduces "Web knowledge accumulation," a technique that abstracts insights from past success and failure logs.

By summarizing specific UI interactions into triplets of "Intent," "Correct Action (OK)," and "Incorrect Action (NG)," the system provides auxiliary guidance during the agent’s inference phase.

This allows the agent to treat past experiences as a universal manual, enabling confident navigation even on unfamiliar web pages.

The system's robustness is further bolstered by an innovative "retry strategy."

When the agent detects a loop or a "false completion"—where it claims to be finished despite the task being incomplete—the system rewinds the history by several steps.

It then temporarily swaps the base model for a different one, such as Claude-3.7-Sonnet, to re-attempt the step.

By bringing in a "guest model" with different reasoning tendencies, the agent can break through logical dead ends that a single model could not overcome.

This approach marks a significant evolution of AI agents from simple instruction-followers to autonomous partners capable of self-correction.