Big Data

Big data can be characterized by the 3 V’s: the extreme volume of data, the wide variety of types of data and the velocity at which the data must be processed. Although big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data, which often cannot be integrated easily.

Due to the structure and volume of big data, the data takes too much time and costs too much money to load into a traditional relational database for analysis. This necessitated the need for new approaches to storing and analyzing data to emerge that rely less on data schema and quality.  Instead, raw data with extended metadata is aggregated in a data lake, while machine learning and artificial intelligence (AI) programs use complex algorithms to look for repeatable patterns. 

Big data analytics is often associated with cloud computing since the analysis of large data sets in real-time requires a platform like Hadoop to store large data sets across a distributed clusters and MapReduce to coordinate, combine and process data from multiple sources.  To support this need, Hadoop appliances have been built to help companies take advantage of the semi-structured and unstructured data they own.

Big data can be contrasted with small data, another evolving term that's often used to describe data whose volume and format can be easily used for self-service analytics.  A commonly quoted axiom is that "big data is for machines; small data is for people."