LLMs Think in Probabilities, Memories, and Noise (But Also Kinda Reason)

Overview

Paper Summary › Explain Like I'm Five › Conflicts of Interest › Identified Limitations › Rating Explanation › Good to know › Topic Hierarchy › File Information ›

Paper Summary

Paperzilla title

Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting exhibit a blend of noisy reasoning, probability matching based on output likelihood, and memorization. LLM performance isn't pure symbolic reasoning, but it improves substantially with CoT, suggesting a more nuanced process than simple memorization.

Explain Like I'm Five

Scientists found that when computers think step-by-step, they get much better at solving problems. They do this by remembering facts, guessing common answers, and sometimes trying to figure things out even if their thinking is a bit messy.

Possible Conflicts of Interest

None identified

Identified Limitations

Limited task scope

The study focuses solely on shift ciphers, which simplifies the problem compared to real-world reasoning tasks. While useful for isolating factors in a controlled way, its conclusions cannot be generalized to more complex scenarios.

Unfaithful explanations and reliance on self-conditioning

The models studied exhibit unfaithfulness between reasoning steps and final answers, revealing a reliance on memorization rather than pure reasoning. The success of CoT prompting seems tied to outputting helpful text for the model to condition on, suggesting an external rather than internal reasoning process.

Multiple examples in one demonstration

The prompt design contains multiple examples in a single demonstration (one-shot CoT), making it less clear how much of the effect of CoT can be attributed to a single demonstration.

Rating Explanation

This paper presents a strong, focused analysis of LLM reasoning using a clever task (shift ciphers). The methodology isolates key factors and provides quantitative evidence. While limited in scope to a single task, the findings about probabilistic, memorization-influenced noisy reasoning are valuable. No apparent conflicts of interest were found.

Good to know

This is the Starter analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →

Topic Hierarchy

Domain: Physical Sciences

Field: Computer Science

Subfield: Artificial Intelligence

File Information

Original Title: Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Uploaded: July 08, 2025 at 12:04 PM

Privacy: Public