Limitations of Predictive Analysis

 

In my blog post discussing traditional statistics, I discussed the idea of inferential statistics and the limitations of using a sample to make estimations about the populations the sample is drawn from. In the world of Big Data this translates to the concept of predictive analytics. This is when Big Data leverages huge data sets, statistical algorithms, and machine learning to make predictions of future outcomes.

Predictive analytic models come in two types: classification and regression models. Classification models put data objects into categories and make predictions based on this whereas regressions models predict continuous data. A classification model may sort customers into categories and make predictions about how receptive they would be to marketing whereas a regression model would make predictions about how much money a customer will generate during their relationship with the company. (1)

The idea of predictive analytics is becoming an increasing part of Big Data, and the theory makes sense that with larger data sets or larger sample sizes then the predictions made on the analysis of these samples become more accurate. The issue is that no matter how large your data set is, it will always be a prediction and never a guarantee. Despite this, Business and organisations have become overconfident in the accuracy of large data dets when applying them to individuals.

Predictive Analytics, like all technologies, are a tool than can be used to gain insight into data but they are not a crystal ball which can be used to see into the future. This is the nature of all statistics, they are educated guesses when all is said and done, even if the educated guess is being made with exabytes of data backing them up.

 

One example of where predictive analytics failed is the Google Flu Trends. GFT was a service developed by google that aimed to use Big Data to analyse Google search query data to predict influenza activity in various regions of the world. The idea was that people would search certain terms when they were sick allowing the service to detect rising flu cases. This ultimately failed for a few reasons. One being because people search for these queries even when are not sick for various reasons and increased media coverage of the flu can lead to increased search activity creating a feedback loop. The algorithm also used historical correlations, but each flu season tends to be different and human behaviour changes. Therefore, changes in search behaviour, media attention, or even they language people use to describe their symptoms all evolves over time and affected the accuracy of the results. (2)

1. https://cloud.google.com/learn/what-is-predictive-analytics?hl=en

2. https://theconversation.com/googles-flu-fail-shows-the-problem-with-big-data-19363

Comments

Post a Comment

Popular posts from this blog

Value of Data

What is Big Data?

Traditional Statistics