Limited number of tested models
The study evaluates vulnerabilities on a limited number of large language models, making it difficult to generalize findings to the broader population of LLMs, especially for those with different architectures or training methods. Future studies should expand the range of tested models for greater generalizability.
Limited range of encoding strategies, attack objectives, and modalities
The benchmark evaluates a specific set of encoding strategies, attack objectives, and modalities (textual), which might not represent the entire landscape of potential vulnerabilities. More diverse attack scenarios, including more complex encoding methods, multimodal attacks, and external API interactions, could reveal additional weaknesses not captured by the current study.
Lack of detailed mitigation strategies
The study primarily focuses on demonstrating vulnerabilities without exploring potential mitigation strategies in detail. Future research should emphasize the development and evaluation of defensive mechanisms to counter these attacks, such as improved filtering algorithms, adversarial training, or other safety measures.