REASONINGBANK: Scaling Agent Self-Evolving with Reasoning Memory

★

☆

SHARE

Overview

Paper Summary

Conflicts of Interest

Identified Weaknesses

Rating Explanation

Good to know

Topic Hierarchy

File Information

Paper Summary

Paperzilla title

AI Agents Get Smarter by Remembering Their Screw-Ups (and Successes!)

This paper introduces REASONINGBANK, a new memory framework that helps AI agents learn from both successful and failed experiences to develop generalizable reasoning strategies. It also proposes memory-aware test-time scaling (MATTS) to enhance this learning by generating diverse experiences during tasks. The approach significantly improves agents' effectiveness and efficiency on web browsing and software engineering benchmarks compared to existing memory systems.

Possible Conflicts of Interest

A significant number of authors are affiliated with 'Google Cloud AI Research' and 'Google Cloud AI'. The experiments primarily utilize Google's own proprietary models (Gemini-2.5-flash, Gemini-2.5-pro). This constitutes a conflict of interest, as the authors are evaluating a system that leverages and potentially enhances technology developed by their employer.

Identified Weaknesses

Dependence on LLM-as-a-judge for correctness signals

The system relies on an AI model (LLM) to determine if a task was successfully completed or failed. This 'AI judge' can sometimes make mistakes or be uncertain, introducing noise into the learning process. While the paper claims robustness, more reliable verification methods could further strengthen the memory induction.

Simplicity in memory retrieval and consolidation

The authors intentionally use basic methods for retrieving memories and adding new ones (embedding-based similarity search and simple addition). While this isolates the core contribution, it means the system doesn't leverage more advanced techniques (e.g., adaptive retrieval, hierarchical consolidation) that could potentially offer greater performance benefits.

Focus on memory content over architecture

The study prioritizes what information is stored in memory (content) rather than how the memory is structured (architecture). This limits direct comparison with other memory architectures, such as episodic or hierarchical memory, which address different aspects of agent intelligence.

Rating Explanation

The paper presents a strong technical contribution with a novel memory framework and test-time scaling method that demonstrates significant improvements in agent performance and efficiency on relevant benchmarks. The methodology is clearly described, and important limitations are acknowledged. However, the presence of a notable conflict of interest, with authors from Google extensively using and promoting Google's own AI models, slightly impacts the overall rating despite the paper's scientific merit.

Good to know

This is our free standard analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →