LLMs Not Dumb, Just Too Wordy? Study Finds AI Can Solve Puzzles If We Don't Make Them Write Novels About It

Overview

Paper Summary › Explain Like I'm Five › Conflicts of Interest › Identified Limitations › Rating Explanation › Good to know › Topic Hierarchy › File Information ›

Paper Summary

Paperzilla title

The paper argues that a previous study's findings of "accuracy collapse" in Large Reasoning Models on complex planning puzzles are due to experimental design limitations, specifically output token limits and unsolvable problem instances. By using alternative representations that bypass these limitations, the authors suggest that models can solve tasks previously deemed too complex.

Explain Like I'm Five

Scientists found that when AI seemed to fail hard puzzles, it was often because the test was unfair or the puzzles were impossible. When the tests were made fairer, the AI could solve them after all!

Possible Conflicts of Interest

The authors are affiliated with Anthropic and Open Philanthropy, which may have interests in promoting positive views of AI capabilities. However, the critique primarily addresses methodological concerns, reducing the likelihood of significant bias.

Identified Limitations

Lack of Novel Contribution

The paper primarily critiques a previous study's methodology, rather than presenting novel research. The focus is on demonstrating how experimental design flaws led to misinterpretations of LLM capabilities, specifically regarding output constraints and problem solvability.

Weak Supporting Evidence

The authors mention conducting "preliminary testing" with alternative representations, but the details are scarce. The sample size is acknowledged as insufficient for statistical significance, limiting the strength of their counter-arguments. More robust experimentation is needed to support their claims of restored performance.

Over-Reliance on Anecdotal Evidence

The paper heavily relies on anecdotal evidence, such as a tweet and model outputs expressing awareness of length limits. While illustrative, these examples don't constitute rigorous scientific proof.

Rating Explanation

The paper provides a valuable critique of experimental design in AI research, highlighting the importance of considering output constraints. However, its lack of novel findings, reliance on preliminary tests, and anecdotal evidence limits its impact.

Good to know

This is the Starter analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →

Topic Hierarchy

Domain: Physical Sciences

Field: Computer Science

Subfield: Artificial Intelligence

File Information

Original Title: The Illusion of the Illusion of Thinking: A Comment on Shojaee et al. (2025)

Uploaded: July 08, 2025 at 12:15 PM

Privacy: Public