The models still don't achieve perfect planning accuracy across all domains, indicating room for improvement and potential unreliability in highly critical tasks.
Focus on Satisficing, Not Optimal Plans
The framework prioritizes finding any valid plan that achieves the goal, rather than the most efficient or shortest plan. This limits its applicability in scenarios where resource optimization or speed is crucial.
Limited PDDL Feature Coverage
The approach currently only uses a subset of Planning Domain Definition Language (PDDL) features, explicitly simplifying the logical reasoning by excluding complex elements like conditional effects or durative actions. This means its capabilities might not directly transfer to more complex real-world planning problems.
Reliance on External Verifier
The system currently relies on an external verification module (VAL) to check the logical validity of generated plans, rather than the LLM being able to reliably self-correct its own reasoning. This dependence limits the autonomy and efficiency of the system.
The training process uses a fixed number of feedback loops (η=10 or 15), which might not be optimal for all problem complexities and could impact efficiency or final performance.
The empirical evaluation was conducted on only three planning domains from PlanBench, which limits the generalizability of the findings to a wider variety of planning scenarios.