July 28, 2022
AlphaFold produces database of proteins after Meta publishes a rival protein-folding model
DeepMind has published predicted structures for 200 million proteins, which it says represents “nearly all cataloged proteins known to science.”
They were generated by AlphaFold, DeepMind’s AI program that can accurately determine a protein’s 3D shape from its sequence of amino acids. Initially released in 2018, DeepMind published a second version in 2020 and a year later released the source code.
DeepMind said the structures it published have the potential to “dramatically increase our understanding of biology.”
“AlphaFold is a glimpse of the future and what might be possible with computational and AI methods applied to biology,” said DeepMind CEO Demis Hassabis.
“Just as maths is the perfect description language for physics, we believe AI might turn out to be just the right technique to cope with the dynamic complexity of biology.”
What is AlphaFold?
Each living cell contains billions of proteins that act as molecular machines which control a body’s vital cell functions. Each protein consists of a string of amino acids which are arranged like a necklace. A protein string ‘folds’ itself into a 3D shape based on the interactions of these amino acids, which then perform different tasks in the body, such as carrying oxygen in the blood from the lungs to body tissues.
Being able to predict a protein’s 3D structure could enable faster drug discovery as scientists could better understand how the body’s proteins relate to diseases.
AlphaFold, a deep-learning neural network, aims to make that possible. The model it uses has 21 million parameters and was trained on more than 170,000 proteins from a public repository of protein sequences and structures.
The system itself uses an attention network - a deep learning technique where an algorithm recognizes parts of a larger problem — then pieces them together to obtain the overall solution. It can do this in minutes or hours, depending on the size of the protein.
Just last year, DeepMind released the protein structures for 20 species – including nearly all 20,000 proteins expressed by humans – on an open database.
Some scientists are already using some of DeepMind’s earlier predicted structures in their work. Researchers from the University of Oxford are using AlphaFold models to work out malaria parasite protein structures in a bid to find antibodies that could block transmission of the parasite.
Its latest release includes predicted protein structures for plants, bacteria, animals and other organisms, with DeepMind suggesting the model’s work could be applied to ongoing issues including sustainability and food insecurity.
All 200 million of the predicted structures can be downloaded via Google Cloud Public Datasets.
And the European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI) is hosting the AlphaFold Protein Structure Database to ensure the predictions are more available to scientists.
DeepMind’s database release comes just days after AI researchers from Meta published ESMFold — a rival model that reportedly boasts “competitive” accuracy levels with AlphaFold.
Meta’s AlphaFold challenger boasts 15 billion parameters and can accurately predict full atomic protein structures from a single sequence of a protein.
“ESMFold inference is an order of magnitude faster than AlphaFold2, enabling exploration of the structural space of metagenomic proteins in practical timescales,” Meta’s research paper reads.
And to challenge rival model AlphaFold2, Meta research engineer Zeming Lin said his team plans to open source ESMFold in the future.