LIMI: Less is More for Agency

★

☆

SHARE

Overview

Paper Summary

Conflicts of Interest

Identified Weaknesses

Rating Explanation

Good to know

Topic Hierarchy

File Information

Paper Summary

Paperzilla title

Forget Big Data, Small Smart Data Makes AIs Act Like Geniuses!

This paper introduces LIMI, demonstrating that strategically curated small datasets (78 samples) can dramatically boost AI agent performance on specific benchmarks (like "vibe coding" and "research workflows"), significantly outperforming models trained on much larger, uncurated datasets. This "Less Is More" principle challenges traditional scaling laws for AI agency, suggesting data quality and curation are more important than sheer volume.

Possible Conflicts of Interest

Yes, conflicts of interest are identified. Several authors are affiliated with SII-GAIR and ASI, which appear to be involved in the development of both the LIMI approach and some of the baseline models (e.g., GLM-4.5, GLM-4.5-Air) and the evaluation environment (SII CLI, AgencyBench). This constitutes a conflict as the researchers are evaluating their own products and frameworks.

Identified Weaknesses

Limited Scope of "Agency" Definition

The study focuses specifically on 'collaborative software development and scientific research workflows' for defining and evaluating agency. While important, these findings might not fully generalize to all possible complex agentic tasks or domains beyond those studied.

Small Dataset Generalizability

While impressive for the tested benchmarks, the extremely small dataset (78 samples) raises questions about how well these agents would perform or adapt to vastly different, novel agentic tasks or environments not precisely covered by the curated data.

Reliance on Foundation Models

LIMI is a fine-tuning approach applied to existing large language models. The observed improvements in 'agency' are enhancements of pre-existing capabilities, not a de novo creation, and are dependent on the quality and architecture of the underlying foundation models.

Labor-Intensive Data Curation

The high-quality training data involves 'human-AI collaborative query collection' and a 'systematic trajectory collection protocol' performed by PhD student annotators. This suggests a potentially expensive and non-scalable process for generating curated data for new and diverse domains.

Proprietary Model in Data Synthesis

The GitHub PR-based query synthesis used GPT-5, a proprietary model. This limits the reproducibility of the data generation pipeline for researchers without access to such commercial models.

Rating Explanation

The paper presents a compelling and significant finding that challenges the common scaling law paradigm in AI, showing that strategic data curation can yield superior agentic performance with drastically fewer samples. The methodology is well-explained, and the results are quantitatively strong on the chosen benchmarks. However, the inherent conflict of interest due to authors evaluating their own models/frameworks, the specific nature of the 'agency' tasks studied, and the potentially labor-intensive data curation process prevent a top rating.

Good to know

This is our free standard analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →