The Protein Folding Problem — Solved?

For over 50 years, one of biology's greatest challenges was the protein folding problem: given a protein's amino acid sequence, can we predict the three-dimensional shape it will fold into? That shape determines everything about what a protein does — its function, its interactions, its role in health and disease. In 2020, DeepMind's AlphaFold2 achieved a leap in accuracy that stunned the scientific community, effectively solving the problem for the majority of protein sequences.

Why Protein Structure Matters So Much

Proteins are the molecular machines of life. Enzymes, antibodies, hormones, structural proteins, receptors — all are proteins, and all work through their 3D shape. A protein's structure reveals:

  • Its active site — where catalysis or binding occurs
  • Potential drug binding pockets
  • How it interacts with other proteins, DNA, or small molecules
  • How mutations might disrupt its function (relevant to genetic disease)

Traditionally, determining a protein's structure required laborious experimental techniques — X-ray crystallography, cryo-electron microscopy (cryo-EM), or NMR spectroscopy — each taking months to years per protein. AlphaFold can predict a structure in minutes.

How AlphaFold Works

AlphaFold2 uses a deep learning architecture called an Evoformer that processes two key types of information simultaneously:

  1. Multiple Sequence Alignment (MSA): AlphaFold examines hundreds of evolutionarily related sequences from other organisms. Positions that have co-evolved (changed together) tend to be physically close in the 3D structure — this is an evolutionary fingerprint of contacts.
  2. Pairwise residue features: The model learns which amino acid pairs are likely to be near each other in space, building up a probabilistic map of the structure.

These representations are iteratively refined by "recycling" — passing the output back as input multiple times — before generating a final 3D coordinate prediction with confidence scores (pLDDT) for each residue.

The AlphaFold Protein Structure Database

In 2021, DeepMind and the European Bioinformatics Institute (EMBL-EBI) released the AlphaFold Protein Structure Database, making predicted structures for over 200 million proteins freely available — covering virtually every protein from known organisms. This represents one of the most significant open-access scientific resources ever created. Researchers worldwide can download structure predictions at no cost, without needing computational infrastructure.

Impact on Drug Discovery and Disease Research

AlphaFold predictions are already accelerating research in tangible ways:

  • Neglected tropical diseases: Structures of proteins from parasites like Trypanosoma brucei (sleeping sickness) and Plasmodium falciparum (malaria) — previously unsolved — are now available for drug target analysis.
  • Antibiotic resistance: Understanding the structures of bacterial resistance proteins opens new avenues for drug design.
  • Cancer biology: Mutant oncoprotein structures help explain how cancer-causing mutations alter protein function.
  • Structural genomics: Researchers can now annotate entire proteomes structurally, identifying functional domains even in poorly characterized proteins.

Important Limitations to Keep in Mind

AlphaFold is not a complete solution to all structural biology questions. Key limitations include:

  • It predicts the ground-state structure, not the range of conformations a protein adopts dynamically in solution.
  • Confidence is lower for disordered regions — which are actually biologically important in many signaling and regulatory proteins.
  • It does not natively model protein-ligand interactions or predict how a small molecule drug might bind (though extensions like AlphaFold3 address some of this).
  • Predictions should still be validated experimentally for critical applications.

AlphaFold3 and What Comes Next

In 2024, DeepMind released AlphaFold3, which extends predictions to protein-DNA, protein-RNA, and protein-ligand complexes — a major step toward modeling the full complexity of molecular interactions inside cells. The pace of development in AI-driven structural biology shows no sign of slowing, and the convergence of deep learning with experimental methods is opening an era where the molecular underpinnings of virtually any biological process can be studied at unprecedented scale and speed.