Academic Team Releases Open-Source Search Agent OpenSeeker
- •OpenSeeker achieves state-of-the-art search performance using only 11.7k synthesized training samples.
- •Academic researchers fully open-source high-quality training data to challenge industrial corporate dominance.
- •The model outperforms Alibaba’s Tongyi DeepResearch on major benchmarks using simple supervised fine-tuning.
OpenSeeker marks a significant shift in the AI landscape by proving that high-performance search agents do not require massive corporate resources or millions of data points. Developed by an academic team, this project directly addresses the "data moat" that has long kept frontier-level search capabilities behind the closed doors of tech giants. By releasing both the model and its full training recipe, the researchers aim to level the playing field for the global community.
The core of the project's success lies in two technical breakthroughs: reverse-engineering web graphs to create complex tasks and a denoising mechanism that helps models learn from messy internet data. By using a technique called entity obfuscation, the researchers can generate difficult queries that force the agent to connect multiple pieces of information step-by-step (multi-hop reasoning). This allows the model to master complex information retrieval without needing the massive scale typical of industrial labs.
Remarkably, OpenSeeker achieved its results through a single supervised fine-tuning (SFT) run, bypassing the expensive reinforcement learning and continual pre-training pipelines used by industry leaders. This efficiency allowed it to surpass Alibaba's Tongyi DeepResearch on Chinese-language benchmarks while remaining entirely transparent. The project provides the broader research community with a high-quality foundation to innovate without the need for corporate-scale infrastructure.