The Future of Big Data – Big Data 2.0


Benjamin Spiegel

The future of big data isn’t about numeric data points but instead about asking the deeper questions and finding out why consumers make the decisions they do.

For data geeks like myself, it has been a hell of a ride. The rise of big data in marketing and media has brought sexy back. Finally, the creative directors, C-suite, and account leaders are leaning on the data scientists once again to provide deep consumer understanding and insights that are backed up and proven by actual consumers (as opposed to an eight-person panel in a Madison Avenue meeting room).

Today, clients often ask me about the future of big data and what the next step is; how can we leverage data on an even deeper level in order to extract meaningful consumer insights that go beyond where we are now? Most of the standard answers are around the ability to get data and insights in real time and from more devices than ever. While it is true that the connected homes, wearables, and connected cars will allow us to collect a much wider set of data points, I believe that this is just an extension of the existing approach.

It’s time we move beyond structured data and into the prime time of text analytics. Here’s why.

Numeric vs. Emotional

Most of the data points collected today are numerical or binary. They tell us if somebody engaged with a site, how well, how long, and where they engaged, but the data fails to tell us why. I believe the future of big data – Big Data 2.0 (to coin a term) – is not about more binary and numeric data points, but instead about asking the deeper questions. Big Data 2.0 should be focused not on what and where but on answering why. It should be concerned with getting a better understanding of the consumer’s emotional state and the decision logic, and thereby provide deeper insight into the consumers’ choices. If we focus on why instead of how often, we can create more meaningful, quality connections between consumers and brands. In other words, while numbers are great indicators of performance, focusing solely on them means brands miss the element of human connection.

Take Amazon data as an example. Amazon is filled with great numerical indicators. Its data can tell us the sales ranks (how many sold relative to category), the customer engagement (how many people shared product reviews), and their satisfaction with the product (the positive and negative reviews). All of these are great indicators, but they are still very simple and only tell a small part of the story.

Let’s assume we are a consumer packaged goods company and we want to introduce a new line of diapers into the market. We decide to look at Amazon in order to better understand which products are category leaders (sales rank and number of sales) and how the consumers like the product itself (reviews). If we analyze these metrics across all diapers, we have a Big Data 1.0 picture that tells us exactly who sells the most and what the audience favorite is.

This is not enough anymore; Big Data 2.0 needs to be about the why: Why is a particular product the most sold? Why does it have an average rating of 5?

What’s the Solution?

For us, the easiest way to get started with Big Data 2.0 is to focus on the unstructured data we collect every day. This can be reviews, customer support emails, community forums, even your own CRM system. The simplest way to look at this data is through a process called text analytics.

Text analytics is a fairly straightforward process that breaks out like this:

  1. Acquisition: Collecting and aggregating the raw data you want to analyze
  2. Transforming & Preprocessing: Cleaning and formatting the data to make it easier to read
  3. Enrichment: Enhancing the data by adding additional data points
  4. Processing: Performing specific analyses and classifications on the data
  5. Frequencies & Analysis: Evaluation of the results and translation into numerical indicators
  6. Mining: Actual extraction of information

Real-World Uses

Here’s a real-world application using our example above. We are trying to understand the diaper market. In order to not turn this into a step-by-step guide, let’s assume that we already have collected all diapers reviews as well as their qualitative indicators. That means we know what sells best and what ranks best/worst. In order to take this to the next level, we would start to extract words and phrases from the reviews. This will tell us some of the recurring patterns and their frequencies within the reviews. I actually performed this analysis by evaluating thousands of reviews and found three very actionable insights we would have never gotten to without text analytics.

1. Why Did It Sell So Well

  • When I looked at the reviews of the top-selling product, I found that the most mentioned terms across the majority of the helpful reviews were “price,” “special,” and “value.” This tells us that people did not buy it because of its quality or features, but because of its pricing. So when we are launching our product, we want to look at this one for price/value guidance instead of features.

2. Why People Did Not Like It

  • This one was very revealing. The brand with the most negative reviews had an extremely high frequency around the terms “tape,” “stick,” “stay closed,” and “open.” After a few reads, I discovered that consumers had no issues with the usual key features on a diaper such as “absorbency,” “leakage,” or “softness,” but actually had issues with the tape on the side of the diaper, and the fact that it kept opening. The amount of negative reviews overall that mentioned these issues makes us believe that this is a feature that brands don’t talk about but consumers care about. Therefore, we would recommend testing ads that address this issue.

3. Smart Filtering

  • One interesting issue we came across is the fact that a lot of the negative reviews were not actually about the product but rather focused on shipping, stock level, and packaging concerns. By tagging and removing these from the set, we are able to evaluable purely on a product level in order to focus on product-related concerns. If we were to list our diaper on Amazon, we would recommend adding a shipping and stock level guarantee prominently in the copy – a competitive advantage that speaks directly to consumer concerns.

4. What Do They Want

  • From an R&D perspective, this insight is worth gold. By evaluating reviews that have terms like “I wish,” “hope,” or “they should,” we are able to detect common features consumers are looking for when thinking about diapers. These are great insights that address the constantly changing need of the consumers. We can feed these product feature-specific insights to our R&D team as well as our copywriters.

As you can see, when analyzing the diaper category just on Amazon alone, Big Data 2.0 yielded insights beyond binary performance indicators. We could see the crowd favorites but did not (yet) know the “why” behind purchases, or understand the positive or negative reviews until our text analytics exercise. There are countless consumer insights to be mined from textual, unstructured data that give us the voice of the consumer, their motivations, and a deeper understanding of their purchasing behavior.

I hope the above examples and thoughts gave you some good ideas and inspiration on how to think about text analytics for your organization and projects. Start looking at your existing data, export your CRM, examine your comments on your website or product mentions in topic forums – even emails from your sales department’s inbox. It’s Big Data 2.0 time and that’s where you’ll find the gold.

If there is enough interest in this subject, I can create several articles that take a deeper look at the actual stages, processes, and tools for text analysis. Feel free to tweet me at @nxfxcom with any questions or comments.