CMT: MID-TRAINING FOR EFFICIENT LEARNING OFCONSISTENCY, MEAN FLOW, AND FLOW MAP MODELS
The paper introduces Consistency Mid-Training (CMT), a novel intermediate training stage designed to significantly improve the efficiency, stability, and performance of flow map models for vision generation. CMT acts as a bridge between pre-training (diffusion models) and post-training (flow map models), providing a trajectory-consistent initialization that reduces total training cost (data and GPU time) by up to 98% compared to baselines, while achieving state-of-the-art FID scores on various image generation benchmarks. The theoretical analysis confirms that CMT provides a strong starting point for flow map post-training, minimizing gradient bias and accelerating convergence.