The Characteristics of Big Data

Due to the nature of Big Data, it is often hard to get your head around. Becuase of the size of Big Data, the variety of its contents, and its general complexity, its is often discussed using four key charactersitcis. These charactersictics are Volume, Variety, Velocity, and Variability.

Volume

This charactersitic refers to the vast quantitites and scale of data being produced and stored. When discussing general data, it is often discussed in terms of megabytes, gigabytes, and terabytes. When discussing Big Data, the size of the data is much larger with the smallest amounts being in terabytes and more often involves petabytes and even exabytes!

1TB could roughly store the Lord of the Rings trilogy extended editions in 8K resolution.

1PB could roughly store the Lord of the Rings trilogy extended editions in 8K resolution 1000 times.

1TB could roughly store the Lord of the Rings trilogy extended editions in 8K resolution 1,000,000 times!

(https://www.overcasthq.com/blog/how-big-are-video-files/. Based on 11 hours of footage at 90GB per hour approximation)

A petabyte could store roughly 1360 years of the lord of the rings trilogy in 8k! This kind of data seems unimaginable but with the world becoming more digital by the day, these amounts of data are constantly being produced. Companies like Google and Facebook process and store this amount of data daily including user engagement, analytics, and backup data. Streaming services store and stream large video files to millions of people per day, businesses process millions of transaction, and the CERN particle accelorator reportedly generates aorund one petabyte of data per second!. (1)

Variety

This characteristic refers to the diffrent forms data can take. Due to the sheer scale of Big Data, it often comes in a wide range of formats. Traditional data is often structured and fits neatly into databases. But Big Data also included sem-structured and unstructured data. Typical structure data could inlcude:

  • Customer Databases
  • Financial Transaction Records
  • Spreadsheet Data
  • Point-of-Sale Data
  • Library Catalogs
This might be the kind of data people would think of if they were asked examples of data. This kind of data is easily recorded, organised, and searched. But in todays modern world with more and more things connected to networks and recording data, data has taken on many new forms and comes from unexpected places:
  • Raw Genome Sequencing Data
  • Raw Data from LiDAR Scanners
  • Scanned Handwritten Notes for Character Recognition Software
  • Surevillance Footage
On top of the vast quantities of data, the variety of data adds makes it complex to process and manage. Though it can add complexity it can also add to the value of the data, as data from different sources can provide a more comprehensive view and understanding to whatever the data is cased on. Felxible solutions such as NoSQL databases can store and integrate data from different sources for data analysis. (2)

Velocity

This characteristoc refers to the speed at which data is created and collected, and the speed at which this data is processed and analysed. Data today is produced faster than ever, and for some applications, real-time processing is required. This is important for business and organisations as so they can make decisions based on accurate and up-to-date data. Because of the speed this data is generated, collected, and analysed, Big Data technologies are required instead of traditional databases. Todays world generates large amounts of data at break neck speeds such as:

  • Social Media - Social media platforms like twitter and instagram generate huge amounts of user data every second including posts, comments, and pictures. Big Data technologies can analyse this data in real time to indetify trends, monitor user behaviour, and deliver personalized content.
  • Internet of Things Devices - IoT devices and sensors produce high speed data streams. Big Data technologies can be used to analyse these data streams for things like smart factories.
  • Financial Transaction/Stock Market Data - Financial Systems rely on fast data velocity to process transactions and stock market activity. It is essential these large amounts of data and processed in Real-time in the stock market for examples to facilitate trading.(3)
Variability

This characteristic refers to the inconsistency and complexity of Big Data. Because of the sheer amount of data, the different sources of data, and the velocity at which it is produced, this means that data patterns change over time. The data could be in diffrent formats and varied in its quality and acurracy. This can lead to inconsistency over time. This also means that accurate information today might become redundant very quickly. The same data could also be interpreted in different ways. As AI and Large Language models become more mainstream and make use of Big Data technologies its east to see how data variability can have an impact.(4)


1. https://blog.westerndigital.com/inside-cerns-exabyte-data-center/

2. https://www.ibm.com/think/topics/big-data

3.https://www.fanruan.com/en/glossary/big-data/data-velocity

4.https://www.simplilearn.com/5-vs-of-big-data-article



Comments

  1. Very interesting to read, this is quite a thorough explanation of the five V's of Big Data with good examples and sources.

    ReplyDelete

Post a Comment

Popular posts from this blog

Value of Data

What is Big Data?

Traditional Statistics