Applying Big Data to the Folding Protein Problem

 

One unique problem that was helped by Big Data techniques is uncovering complex fold and structures in hundreds of protein families. Big Data techniques were used on a massive database of 2 billion genome sequences to decode the structures of over 600 protein molecules. This immense dataset provides a crucial foundation for understanding the intricate ways proteins assume their complex three-dimensional shapes.

The data was processed using Rosetta@home which is a distributed computing platform. This led to a significant reduction in the cost and time required to decode the protein structures. This innovation has led to a rapid acceleration of scientists understanding of proteins, which helps with research in a range of scientific fields.

In 2018 an AI model was then used to further develop this approach and increased the decoding of proteins by six times. AlphaFold2 uses deep-learning technology to increase to discovery of protein structures to incredible levels. Over 60 years all scientists working on solving protein structures discovered around 150,000 protein structures where as AlphaFold2 managed discover over 200 million of them.

 

https://www.geekwire.com/2017/big-data-rosetta-protein-puzzles/

https://magazine.hms.harvard.edu/articles/did-ai-solve-protein-folding-problem

https://www.youtube.com/watch?v=P_fHJIYENdI&ab_channel=Veritasium

Comments

Popular posts from this blog

Value of Data

What is Big Data?

Traditional Statistics