What is big data?
Answer by Balaji Viswanathan:
Big Data, Cloud, Internet of Things are sexy, marketing buzzwords to describe existing technologies that are ready for the mainstream. In fact, at LinuxCon I was at a talk emphasizing on creating such marketing goo to help whip up the excitement.
Dilbert comic strip for 07/29/2012 from the official Dilbert comic strips archive.
Big Data used to be called Analytics/Business Intelligence before the industry felt the need for a sexier term. If you have ever drawn a chart in Excel out of a column of data, you have used a tiny version of "Big Data". Just that scale is massive. Big data just means making sense out of a large volume of data.Ok, enough of cynicism.
How is Big Data different from "little data"?
Let's assume you have a leak in a water pipe in your garden. You take a bucket and a some sealing material to fix the problem. After a while, you see that the leak is much bigger that you need a specialist (plumber) to bring bigger tools. In the meanwhile, you are still using the bucket to drain the water. After a while, you notice that a massive underground stream has opened and you need to handle millions of liters of water every second.
You don't just need new buckets, but a completely new approach to looking at the problem just because the volume and velocity of water has grown. To prevent the town from flooding, maybe you need your government to build a massive dam that requires an enormous civil engineering expertise and an elaborate control system. To make things worse, everywhere water is gushing out from nowhere and everyone is scared with the variety.
Welcome to Big Data.
I will give you an example from my previous startup. [More details: Does Social Media Affect Capital Markets?] We had a hypothesis that we could understand the market psychology by looking at the tweets. For instance, if I want to predict the movement of Apple stock, I could look at the tweets related to:
- Media perceptions of Apple – how many times the company/product gets mentioned in major media.
- Customer perceptions of Apple – are the customers positive or negative about the upcoming iPhone 6? Will people continue to buy Apple?
- Employee perceptions of Apple – are there any tweets from Cupertino [the company's location] that could be linked to some employees of the company? How happy or sad are they?
- Investor perceptions of Apple – what do sophisticated investors and analysts think about Apple?
The sum of all these perceptions will determine what will be the price of Apple's stock in the future. Getting that right could mean billions of dollars.
To put it layman's terms, if we could really understand what the different people are talking about a particular company and its products, we could somewhat predicts its future earnings and thus the direction in which the stock price would move. That would be a huge advantage to some investors.
Babson MBAs Use Social Media to Predict Moves in the Stock MarketHowever the problem is this:
- There are over 500 million tweets every day that is flowing every second (High Volume & Velocity)
- We have to understand what each tweet means – where is it from, what kind of a person is tweeting, is it trustworthy or not. (High Variety)
- Identify the sentiment – is this person talking negative about iPhone or positive? (High Complexity)
- We need to have a way to quantify the sentiment and track it in real time. (High Variability)
The key elements that make today's Big Data different from yesterday's analytics is that we have a lot more volume, velocity, variety, variability and complexity of data. [called as the 5 Key Elements of Big Data.]
Applications
Big data includes problems that involve such large data sets and solutions that require a complex connecting the dots. You can see such things everywhere.
- Quora and Facebook use Big data tools to understand more about you and provide you with a feed that you in theory should find it interesting. The fact that the feed is not interesting should show how hard the problem is.
- Credit card companies analyze millions of transactions to find patterns of fraud. Maybe if you bought pepsi on the card followed by a big ticket purchase, it could be a fraudster?
- My cousin works for a Big Data startup that analyzes weather data to help farmers sow the right seeds at the right time. The startup got acquired by Monsanto for big $$.
- A friend of mine works for a Big Data startup that analyzes customer behavior in real time to alert retailers on when they should stock up stuff.
There are similar problems in defense, retail, genomics, pharma, healthcare that requires a solution.
Summary:
Big Data is a group of problems and technologies related to the availability of extremely large volumes of data that businesses want to connect and understand. The reason why the sector is hot now is that the data and tools have reached a critical mass. This occurred in parallel with years of education effort that has convinced organizations that they must do something with their data treasure.