Stereochemistry Violations
While the model demonstrates high accuracy in predicting biomolecular structures across various categories, including protein-protein, protein-nucleic acid, and protein-ligand interactions, it exhibits limitations in specific areas. One notable limitation is the model's occasional violation of stereochemistry, where the predicted output structures do not always adhere to correct chirality. Although a penalty for chirality violation is included in the ranking formula for model predictions, this issue has not been fully resolved. Additionally, there is a tendency for the model to generate structures with overlapping atoms (clashes), particularly in protein-nucleic acid complexes where both the number of nucleotides and residues are substantial.
The shift from a non-generative to a diffusion-based model introduces the challenge of generating spurious structural order in disordered regions, also known as hallucinations. While these hallucinated regions are marked with low confidence, they lack the distinct ribbon-like appearance seen in AlphaFold 2 predictions for disordered regions. To mitigate this, the authors utilize distillation training from AlphaFold 2 predictions and add a ranking term to encourage ribbon-like predictions. However, these limitations still persist and can affect the accuracy of structure prediction.
Limited Dynamics and Conformational Coverage
Like other protein structure prediction models, AlphaFold 3 primarily predicts static structures as observed in the Protein Data Bank (PDB), without capturing the dynamic behavior of biomolecular systems in solution. Multiple random seeds used for either the diffusion head or the overall network do not approximate the solution ensemble, potentially resulting in inaccurate or incomplete conformational states. The model may not always predict the correct conformational state, especially for systems that undergo significant conformational changes upon ligand binding. For instance, AlphaFold 3 predicts a closed state for both apo and holo forms of E3 ubiquitin ligases, whereas the open state is observed in apo structures.
Computational Cost and Target Dependence
Achieving the highest accuracy with AlphaFold 3 may necessitate generating many predictions and ranking them, increasing computational costs. This is particularly noticeable in antibody-antigen complex predictions, where the accuracy of top-ranked predictions continues to improve with the number of model seeds, up to 1000. Such dependence on the number of seeds for performance improvement is not observed for other molecule types, suggesting a need for further optimization or alternative strategies for specific interaction types.