Paper Summary
Paperzilla title
Forget Big Data, Small Smart Data Makes AIs Act Like Geniuses!
This paper introduces LIMI, demonstrating that strategically curated small datasets (78 samples) can dramatically boost AI agent performance on specific benchmarks (like "vibe coding" and "research workflows"), significantly outperforming models trained on much larger, uncurated datasets. This "Less Is More" principle challenges traditional scaling laws for AI agency, suggesting data quality and curation are more important than sheer volume.
Possible Conflicts of Interest
Yes, conflicts of interest are identified. Several authors are affiliated with SII-GAIR and ASI, which appear to be involved in the development of both the LIMI approach and some of the baseline models (e.g., GLM-4.5, GLM-4.5-Air) and the evaluation environment (SII CLI, AgencyBench). This constitutes a conflict as the researchers are evaluating their own products and frameworks.
Identified Weaknesses
Limited Scope of "Agency" Definition
The study focuses specifically on 'collaborative software development and scientific research workflows' for defining and evaluating agency. While important, these findings might not fully generalize to all possible complex agentic tasks or domains beyond those studied.
Small Dataset Generalizability
While impressive for the tested benchmarks, the extremely small dataset (78 samples) raises questions about how well these agents would perform or adapt to vastly different, novel agentic tasks or environments not precisely covered by the curated data.
Reliance on Foundation Models
LIMI is a fine-tuning approach applied to existing large language models. The observed improvements in 'agency' are enhancements of pre-existing capabilities, not a de novo creation, and are dependent on the quality and architecture of the underlying foundation models.
Labor-Intensive Data Curation
The high-quality training data involves 'human-AI collaborative query collection' and a 'systematic trajectory collection protocol' performed by PhD student annotators. This suggests a potentially expensive and non-scalable process for generating curated data for new and diverse domains.
Proprietary Model in Data Synthesis
The GitHub PR-based query synthesis used GPT-5, a proprietary model. This limits the reproducibility of the data generation pipeline for researchers without access to such commercial models.
Rating Explanation
The paper presents a compelling and significant finding that challenges the common scaling law paradigm in AI, showing that strategic data curation can yield superior agentic performance with drastically fewer samples. The methodology is well-explained, and the results are quantitatively strong on the chosen benchmarks. However, the inherent conflict of interest due to authors evaluating their own models/frameworks, the specific nature of the 'agency' tasks studied, and the potentially labor-intensive data curation process prevent a top rating.
Good to know
This is our free standard analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.
File Information
Original Title:
LIMI: Less is More for Agency
Uploaded:
September 27, 2025 at 04:56 PM
© 2025 Paperzilla. All rights reserved.