Unfolding Protein Prediction with Scale

SeedFold achieves state-of-the-art performance on the FoldBench benchmark by scaling folding models across model capacity-through a Pairformer width of 512-architecture-employing linear triangular attention to reduce computational complexity-and data-leveraging large-scale distillation to expand training to 26.5 million samples, as demonstrated by its 384-width variant, SeedFold-Linear.

A new model, SeedFold, demonstrates that scaling both data and model size, combined with a novel attention mechanism, dramatically improves the accuracy of biomolecular structure prediction.