Data is being generated on a massive scale every minute from most actions you complete in a day; searching the web, posting on social media, going shopping or using transport. We roughly output around 2.5 quintillion bytes of data a day with 90% of the data that is currently available being created in the last couple of years alone. This big data explosion has occurred in recent years due to the high-tech world we currently live in giving rise to cheap and numerous data information-sensing devices like mobile phones. Here we will discuss the different types of data we can find, what constitutes “Big Data” and how is it used, as well as discussing the categories of big data that are termed ‘The Four V’s of Data’.
Structured vs Unstructured Data
Data generated can be categorised into two main formats:
Structured Data — generally refers to easy to search data that has a deﬁned length and format, such as numbers, dates and groups of words. The collection and analyses of structured data has been occurring for a long time and therefore the technology is more developed. Structure data usually resides in relational databases (RDBMS) that store numbers, codes or text that are easily contained and easy to search. The searchable data makes it easy for both humans and algorithms to query or analyse.
Unstructured Data — Unstructured data is more complex, less easily deﬁned and unable to be structured via pre-deﬁned data models. This type of data can be emails, videos, music, social media posting etc. The collection, synthesis and analysis of this type of data is more complicated and is a newer, developing technology.
So why is data beneﬁcial and what can we do with it?
You can analyse data sets to reveal and identify any trends and patterns. This is beneﬁcial for a large range of uses, but often used mostly by companies to track consumer activity and subsequently market towards that area. Previously, due to limitations in technology most data analysed was structured data and on a smaller scale, meaning masses amounts of generated data was discarded. This discarded data could have proved very useful in providing more background into consumer behaviour, as the higher the quantity and variety of data you can analyse the more you can predict trends and analyse markets. This is where Big Data comes in.
Big data refers to these massive data sets, generated from both traditional and digital sources, that are used to identify trends and patterns. The data sets are so big and complex and include both structured and unstructured data, that traditional data processing application software are inadequate to deal with them. Some tools currently used today to store and analyse Big Data are Hadoop, Microsoft HD Insight and NoSQL. Big Data analytics provides greater business intelligence, improving their decision-making capabilities, understanding of customer needs, development of products/services and enhancing staff productivity and efﬁciency.
The Four Big V’s of Data
The four V’s of data are the categories that big data come under: Veracity — is the trustworthiness of data, Variety — the types of data (structured vs unstructured), Volume — the quantity, Velocity — the speed in which data is generated, gathered, processed and used to generate outcomes in business.