[MUSIC] In the previous videos, you have heard some of our speakers talk about data sources like social media, emails, and documents. All things you may not typically think of as data sources. In this video, we will define big data and big data and analytics. Look at the differences between structured, unstructured, and semi-structured data. And talk about the applications of big data. For example, a large bank was looking at improving their customer satisfaction ratings. The bank traditionally measured CSAT, or Customer Satisfaction, and NPS, or Net Promoter Score, as the two primary indicators. Both these were traditionally obtained from periodic surveys. We worked with the bank to obtain and link a large variety of different sources of data. First, we used the surveys and analyzed the natural language text in the open comments field. This gave us a good view of the topic areas of concern to their customers. Second, we recorded all the calls coming to the customer service call center. Based on the speech analytics, speech to text, and text analytics, we we're able to identify the key reasons for the calls and the emotional state of the callers and the responders. Finally, we analyzed the social media channels to assess what their customers were saying about the bank and their competitors. This allowed us to build a complete view of the customer's interactions and sentiments about the bank, which helped them make better decisions. The amount of information available is exploding as digitarization, and the Internet of things, has increased the number of data sources and the value and complexity of data. Now, we use big data as a term to describe a collection of data sets so large and complex that it becomes difficult to process using basic database management tools or traditional data processing applications. The large data sets involved can consist of numerous data formats in either a structured, a semi-structured, or an unstructured form. Let's take a look at what we mean by structured, unstructured, and semi-structured data. Think about the list of names, addresses, and phone numbers found in a phone book. This is an example of structured data. It is well defined data, like customer names, ages, identifiers, etcetera, that you can collect formally. The most popular platforms for structured data include, Oracle, Microsoft SQL Server, Microsoft Access, and so on. Big data can be associated with structured data sources, but not exclusively. Now, lets look at unstructured data. Unstructured data is not broken down into individual components. The data is a bunch of sentences that you need to make sense of, like in a Word document. It is a collection of videos or audio recordings on YouTube. It is millions of e-mails or pictures or social media posts. It can be a recorded conversation. The challenge is, how do you take this unstructured data and do something meaningful with it? To understand semi-structured data, take that Word document that represents unstructured data and add metadata. Tags to keywords so that it is easily searchable. Now you have semi-structured data. Semi-structured data does not conform to a structural format like relational or other standard formats. Semi-structured data includes tags and other markers to separate data elements. Big data is not just about the data. It is about the interconnectedness of the data. Big data sets can be linked together, and insights can be derived from those linkages. Today, organizations capture and store an ever-increasing amount of data. Internet availability, interconnectedness, rapid connection speeds, and mobility contribute to the torrent of data points being generated daily. Organizations want to realize the potential value of these extreme size data sets, and discard less and less information. Whether it is customer data or internal data. However, the existing means to process and analyze data cannot scale to extreme sizes economically. As far back as 2001, industry analyst Doug Laney, currently with Gartner, articulated a now mainstream definition of big data as four Vs. Volume, velocity, variety, and veracity. First, let's look at volume. Volume reflects the size of a dataset. New information is generated daily, and in some cases hourly, creating datasets that are measured in terabytes and petabytes. Many factors contribute to this increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues have emerged. Including how to determine relevance within the large data volumes, and how to use analytics to create value from the relevant data. The second V we want to look at is velocity. This reflects the speed at which data is generated and used. New data is being created every second. In some cases, it may need to be analyzed just as quickly. Radio Frequency Identification, or RFID tags, sensors, and smart metering, are driving the need to deal with torrents of data in near real time. Reacting rapidly enough to deal with data velocity is a challenge for most organizations. Variety is the third V, and it represents the diversity of the data. Data sets will vary by time. Social networking, media, text, and so on. And they will vary how well they are structured. Data today comes in all types of formats. Structured, numeric data, and traditional databases. Information created from line of business applications. Unstructured data in the form of text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging, and governing different varieties of data is something many organizations still grapple with. Next, we have veracity. Data veracity refers to the biases, noise and abnormality in data. Is the data that is being stored and mined meaningful to the problem being analyzed? Veracity in data analysis is the biggest challenge when compared to things like volume and velocity. In scoping out your data and analytic strategy, you need to have your team and partners work to help keep your data clean, and create processes to prevent dirty data from accumulating in your systems. Even more important than the definition of data is what data promises to achieve. Effectively used, data can be transformed into insights and intelligence. Delivered where and when they are needed to make and implement strategic and operational decisions. There is one more V to take into account when looking at data and analytics. And that is value. Having access to data creates value only when you have the right data to clean strategic insights. Companies can generate significant value from your data. An online retailer for example, was planning to enhance their recommendation engine. The current software relied on a static set of rules to determine one of five different paths through their website. They wanted to modify this to make recommendations based on the individual profile of the customer, the amount of time they spend on a page, the keywords they enter, and what other customers like them have done in the past. So just how big is big data? Think about these facts. More than half of new data created is in video and audio formats. And by the year 2020, total global Internet traffic will exceed 200 exabytes per month. An exabyte is equal to 1 billion gigabytes. Global mobile traffic will increase to 30 exabytes per month, and will increase by 50% combined annual growth rate. The total number of users with Internet access will exceed 3.5 billion. The total number of mobile devices will exceed 10 billion. Think about that for a minute. What does all that data mean for organizations? How will organizations use this data? Big data is a game changer in making business decisions. Let's look at how organizations are currently using social media. In traditional use, businesses use the convening power of social media to boost their image and better anticipate consumer trends. Few organizations have harnessed the potential power of social media for applications beyond marketing and public relations. Let's take an example. A large investment bank is worried about its compliance risk. Regulators are monitoring customer complaints made directly to them as well as the social media channels of the financial institutions. The investment bank built a social media dashboard that monitored customer complaints on a regular basis made to their social media site as well as other public forums. The dashboard captured the rate of change of the number of messages on a particular topic, as well as the rate of change of sentiment with respect to the same topic. This allowed them to react and respond fast whenever there was a change in volume or sentiment related to themselves or their competitors. Let's recap what we just covered. Big data is made up of structured, unstructured, and semi-structured data. The amount of data that is being produced is growing at an astounding rate. And the key to this data is the interconnectedness of it all. We covered a lot of information here. You can use the interactive PDF to review the big data concepts. In the next video, you will hear from some of our PWC professionals about how we have used big data to solve client issues. [MUSIC]