Applying Big Data to the Folding Protein Problem
One unique problem that was helped by Big Data techniques is
uncovering complex fold and structures in hundreds of protein families. Big Data
techniques were used on a massive database of 2 billion genome sequences to
decode the structures of over 600 protein molecules. This immense dataset
provides a crucial foundation for understanding the intricate ways proteins
assume their complex three-dimensional shapes.
The data was processed using Rosetta@home which is a distributed
computing platform. This led to a significant reduction in the cost and time
required to decode the protein structures. This innovation has led to a rapid
acceleration of scientists understanding of proteins, which helps with research
in a range of scientific fields.
In 2018 an AI model was then used to further develop this
approach and increased the decoding of proteins by six times. AlphaFold2 uses
deep-learning technology to increase to discovery of protein structures to incredible
levels. Over 60 years all scientists working on solving protein structures discovered
around 150,000 protein structures where as AlphaFold2 managed discover over 200
million of them.
https://www.geekwire.com/2017/big-data-rosetta-protein-puzzles/
https://magazine.hms.harvard.edu/articles/did-ai-solve-protein-folding-problem
https://www.youtube.com/watch?v=P_fHJIYENdI&ab_channel=Veritasium
Comments
Post a Comment