← Back to papers

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy

★ ★ ★ ☆ ☆

Paper Summary

Paperzilla title
Be Rude, Get Smart! ChatGPT-4o Prefers Your Sass, Not Your 'Pleaease!'

This study found that impolite prompts consistently led to higher accuracy in ChatGPT-4o on multiple-choice questions, outperforming polite prompts. The research utilized a relatively small dataset of 50 base questions, each rewritten into five politeness variants, and primarily tested only one LLM. These findings suggest that newer LLMs may respond differently to tonal variation than previously observed.

Explain Like I'm Five

When you ask a smart computer questions, it actually answers better if you're a little bit rude instead of super polite. It's like it tries harder when you're bossy!

Possible Conflicts of Interest

None identified

Identified Limitations

Small Dataset Size
The study used only 50 base multiple-choice questions, rewritten into 250 variants. This small dataset limits the generalizability of the findings across a wider range of tasks or knowledge domains.
Limited LLM Scope
Experiments primarily relied on ChatGPT-4o. The paper acknowledges that different LLM architectures and training corpora may respond differently, and thus, the findings may not apply to other models without further validation.
Narrow Performance Metric
The evaluation focused solely on accuracy in a multiple-choice setting. It did not assess other important qualities of LLM performance such as fluency, reasoning, coherence, or helpfulness.
Constrained Politeness Operationalization
The definition of 'politeness' and 'rudeness' relied on specific linguistic cues (prefixes), which may not encompass the full sociolinguistic spectrum of tone or account for cross-cultural differences. This could lead to a simplified understanding of how politeness manifests.
Ethical Implications of Findings
The authors acknowledge that the finding (rude prompts yielding better results) could encourage the deployment of hostile or toxic interfaces, leading to negative user experience and harmful communication norms, which is a significant concern for responsible AI development.

Rating Explanation

The paper presents interesting, counterintuitive findings regarding prompt politeness and LLM accuracy. However, its generalizability is limited by a small dataset, reliance on a single LLM (ChatGPT-4o) for most experiments, and a narrow definition of politeness. It's a good preliminary study, but needs broader validation.

Good to know

This is the Starter analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →

Topic Hierarchy

File Information

Original Title: Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
Uploaded: October 11, 2025 at 01:30 AM
Privacy: Public