The dataset used to train and evaluate the models is relatively small (~7500 images), which may limit the generalizability of the results to real-world scenarios with more diverse data.
The authors compare YOLOv3 and Faster R-CNN based on inference time and average precision, but other important evaluation metrics like F1-score, recall, and area under the ROC curve are missing. A more comprehensive evaluation would be needed to draw stronger conclusions.
Limited Discussion of Limitations
The paper lacks a detailed discussion of the limitations of the proposed method. Addressing potential issues like variations in mask types, lighting conditions, and face orientations would strengthen the analysis.
Lack of Deployment Considerations
The paper's focus on real-time performance is valuable, but it does not adequately address the challenges of deploying the models on resource-constrained devices. A discussion of optimization techniques and model compression would be beneficial.