Welcome. There are two different types of data sets, traditional data sets and big data. As was mentioned in the first lesson, the digital thread is the concept of seamlessly integrating information throughout the different stages of value chain. In this setting, the value chain means all of the entities that contribute to the production of product, from requirement gathering, design, manufacturing, testing, users, on to final sustainment and disposal. The digital thread can enable specialists to work on the product and process definitions simultaneously to inform decisions throughout the life of a system or product. Said another way, the digital data that is generated by the product during its fabrication or manufacturing process can be used by designers to redesign the product so that the customer can have more value from the product. Let's revisit another term, big data. It is a term used to describe very large data sets that are very complex to analyze. Traditional analysis methods cannot be applied to big data sets in order to generate actionable intelligence. Some properties of big data are, one, volume, which refers to the massive amount of data, two, variety, that refers to the fact that data could be from many different sources and could be of many different types, three, velocity, which indicates the need for high processing capabilities in order to process the massive amount of data. High processing capacity means either high performance computing or cloud computing infrastructure. The digital thread leads to big data. Recall that previously I said that the Internet of Things lead to continuous stream of data, a digital thread associated with the product includes data from CAD models, performance testing phases, manufacturing process, consumer use, and more. If you think about all the data that is collected from different phases of product development, manufacture, and use, the amount of data will be of very high volume. That's why we say the digital thread leads to big data. One of the ways to understand big data is to compare it with traditional data sets. When we talk about traditional data sets, we mainly mean that the data has been stored in a centralized location at a single point, maybe in a single database. Additionally, there is an assumption that the data is in structured format. These two qualities enable us to do very easy queries on traditional data sets. Traditional data sets are typically small, with the largest data set ranging between gigabytes to terabytes. Big data is quite different. First, it is distributed. This means that it comes from disparate sources at different times and locations. Additionally, there might not be any systematic structure with respect to the data format. The data format could be structured or unstructured. The big data sets have really large amount of data. And there are multiple challenges that are associated with big data processing. First thing that we need to understand is that processing big data requires different expertise than processing traditional data set. It's a complex process and you need to have people that can influence people across different department to make it work. Working across the aisle is not an always easy thing to do. The second challenge is the cost of computational infrastructure for big data implementation. Since big data typically involves computing platforms that can be costly, the upfront investment in analyzing big data can be a very costly proposition. The third challenge is that of security concern. Processing big data sometimes requires the uses of cloud computing. Cloud computing paradigm requires that you store and share the data in the cloud. This raises information security concerns. You have to be comfortable with the application being stored in the cloud. The information security concern with big data will be discussed in detail in another course.