Task-specific architecture
The 'attention-free' MLP architecture, which greatly improved performance on Sudoku-Extreme, performed poorly on tasks requiring larger context lengths like Maze-Hard and ARC-AGI, indicating its benefits are not universal. This limits the general applicability of some of the proposed architectural simplifications.
Lack of theoretical explanation for recursion's effectiveness
The paper acknowledges that while recursion improves performance, the specific theoretical reasons why it helps so much compared to simply using a larger or deeper network are not fully understood, suggesting a gap in fundamental understanding.
Reliance on heavy data augmentation
The improved generalization on small datasets like Sudoku-Extreme, Maze-Hard, and ARC-AGI heavily relies on extensive data augmentation (e.g., 1000 shufflings/transformations per example), which might obscure the model's intrinsic generalization capabilities without such preprocessing.
TRM, like HRM, is a supervised learning method that provides a single deterministic answer. It cannot handle generative tasks or scenarios where multiple correct answers exist, which limits its applicability in broader AI challenges.
Limited resources for extensive testing
The authors note that 'more recursions could be helpful for harder problems (we have not tested it, given our limited resources),' suggesting that the optimal number of recursions and its impact on very complex problems might not have been fully explored.