AI Proves Logic Puzzles, But Gets Tripped Up By Messy Clues

Overview

Paper Summary › Explain Like I'm Five › Conflicts of Interest › Identified Limitations › Rating Explanation › Good to know › Topic Hierarchy › File Information ›

Paper Summary

Paperzilla title

This paper evaluates Large Language Models (LLMs) on their ability to construct and explain proofs using OWL (Web Ontology Language) ontologies, finding that while some models perform strongly, they struggle significantly with conclusions requiring complex derivation patterns, noisy input data, and incomplete premises. The study reveals that logical complexity, rather than the input format (formal logic vs. natural language), is the primary factor limiting LLM performance in these tasks.

Explain Like I'm Five

This paper shows that smart computer programs can solve logic puzzles and explain their answers, but they get confused when the puzzles are very tricky, have extra wrong clues, or have missing clues.

Possible Conflicts of Interest

None identified

Identified Limitations

Limited Capacity for Complex Derivations

Models struggle with conclusions requiring complex derivation patterns beyond simple transitive closures, indicating a limitation in handling intricate logical structures that require more sophisticated reasoning.

Sensitivity to Noisy and Incomplete Input Data

Performance significantly declines (up to 47% for noisy axioms, up to 38% for missing premises), highlighting a lack of resilience in real-world scenarios where input data is often imperfect.

Focus on EL-Ontologies

The evaluation primarily uses EL-ontologies, a specific and less expressive fragment of Description Logic, which might limit the generalizability of findings to more complex OWL 2 DL ontologies found in many applications.

Reliance on Specific Prompting Techniques

The study uses prompt engineering (inference rules or examples), which can influence model performance and might not always reflect the LLMs' intrinsic reasoning capabilities without explicit guidance.

Token Limit Impact on Complex Examples

In highly nested and intricate axiom sets, LLMs were sometimes unable to produce a complete result within token limits, indicating a practical constraint for very complex real-world ontologies.

Rating Explanation

The paper provides a thorough and systematic evaluation of LLMs for proof construction in OWL ontologies, utilizing multiple models and real-world datasets. It delivers valuable insights into the strengths and significant limitations of LLMs in logical reasoning, particularly concerning complexity, noise, and incomplete premises. The methodology is sound, and the findings are well-supported and contribute meaningfully to the field.

Good to know

This is the Starter analysis. Paperzilla Pro fact-checks every citation, researches author backgrounds and funding sources, and uses advanced AI reasoning for more thorough insights.

Explore Pro →

Topic Hierarchy

Domain: Physical Sciences

Field: Computer Science

Subfield: Artificial Intelligence

File Information

Original Title: Large Language Model for OWL Proofs

Uploaded: January 22, 2026 at 11:39 AM

Privacy: Public