Over-optimization on CIFAR-10 baseline
The authors suggest their method's underperformance on CIFAR-10 against standard Flow Matching is due to the latter's extensive, highly-tuned noise and sampling schedules. This indicates EqM might not be universally superior without specific tuning, or that the comparison on this dataset is not entirely fair to EqM, rather than a fundamental flaw of Flow Matching.
Stability concerns with L2 norm variant of explicit energy model
The explicit energy model's L2 norm variant is sensitive to initialization and 'harder to optimize' than the dot product variant. This suggests potential fragility or increased complexity in certain model formulations, which could hinder broader adoption or require more expert tuning.
Theoretical justifications rely on strong assumptions
The theoretical statements regarding learned gradients, local minima properties, and convergence rates are predicated on conditions like 'perfect training' and 'high-dimensional settings' or 'L-smoothness.' While common in theoretical analyses, perfect training is practically unachievable, and the real-world implications of these assumptions need further empirical validation.
Limited comparative evaluation for unique properties
While EqM demonstrates novel capabilities (denoising, OOD detection, composition), the comparisons for these properties are not always against the current state-of-the-art methods in those specific subfields, making it hard to definitively claim superiority for these unique tasks.