Hello, in this sub unit, we'll talk about data, we'll talk about what's the difference between structured data and unstructured data. So what is data? Well, we see it all the time, we've seen it and all kinds of websites, mobile devices, on our computer, or even outside of online world. Here's one example. Here's data that's, it's a GDP data from World Bank. And I'm sure you've seen something like this, where there's a spreadsheet, with some country information, and here is the columns and then there is a GDP. So this is one kind of data, which is quite common, we encounter in many places. Here is another example and again this is a spreadsheet where this is as a T-data from a New York City open data site and again you can see different school names here as at this course and so on. This is an example of what's called structure data because here, all the data fields and this case such as a school name, the SAT score and, it's broken down into reading, math, writing scores. So those things, those numbers are marked, there are these fields that are marked, and so this is an example of structure data. Here's another example now. This is a blog example where there are some things that are defined, but most things here are what's called unstructures, they may appear to be organized at least on the layout, but the fields are not marked, they're not presented in a way that shows predefined kind of a data. This typically happens with text, I saw here. Can see that in the blog there was likely to be text, but also has a combination of text, and images, and so on. And I'm sure you've seen something like this if you have spent any time on the Internet. Something like YouTube, where it's multimedia data, and we say it's multimedia because it's primarily video but there is also mix of text and often images, and other source of data that's combined in this one place. So there are different formats here. There are different structure here. Some of that is structured data, the things that are marked. And other are unstructure because there're not well defined, they are free fine text and tags and other things. And here is an example from another social media site that you may have heard about Twitter. So again, here some things are marked clearly some things are not. So it's a mix of all different formats and data so there's the basic unit, a tweet is text but then that also includes lings and often know, videos, and images, things like that. And so, some of the things are structured, some others are not. So of course, now we've seen these examples, we can provide a more formal definition, and understanding of what the structured and unstructured data are. So structured data is where you have some kind of a high level of organization. And then, this typically happens in relational databases, spreadsheets which also example short sheets. So that's what defines some structured data. Here's an example from, it's a data schema or data model, from my SQL which is structured query language. And you can see here these are, and it's okay if you don't know SQL or never used it, but, essentially, these are different tables, so there's departments, employees, salaries, titles, and then so on. And for each of those tables there are some characteristics of the data that would go into it. So for instance, when you're talking about an employee, there is going to be their birthday, their first name, their last name, gender, hire date, things like that. So these are defined labels. In order to describe an employee, you have a predefined set of labels that goes into there. So the structure data will depend on a data model that's typically already defined. And that definition, that data model tells us how the data about, for example an employee, should be stored presented. And so, the nice thing is because there is a predefined data model where somebody tells us how to store the data, it's easy to enter that data, it's easy to store it, and it's also easy to retrieve from it because now you can, ask a very specific question. For instance, you can ask, give me the record of an employee named Jane Doe. And so, you're able to say first name Jane, last name Doe, from that employee table, because those fields, first name, last name, are defined. So for instance, if you want to find out how many employees will be retiring in the next five years, you could query this employee table using the birth date to find out their age and use that information. So you're able to specifically ask for a field because those fields are defined. So that's the nice thing about structured data that since the data model is defined, a data model is present. All those fields are define and easily, can be easily restored, queried, retrieved, analyzed. Unstructured data on the other hand those not come with those kind of labels, does not come with the predefined data model. And so, here's an example. It's a Wikipedia article, or as a screenshot of Wikipedia. But a lot of things on Wikipedia, those articles, they have free flowing text. Now, there could be mark up text that defines a formatting of the tags. It could be images, videos, links. But those things are not going into specific predefined data model categories. This is just what we saw before in the employee record. And so, photos, graphic images, the videos, PD apps, emails, all those things tend to fall in unstructured data because the data has been generated, stored. But they're none going into specific data fills. So that sort of the unstructured data. When it comes to social media, we see mix of both. So we see things like this where here's an example where you collected this data from Brexit. So based on the hashtags that are used, so these data that are basically tweets and we can represent them using this kind of numbers and say, well, how many tweets are using #brexit. How many tweets are using #voteleave and so on. And so, this is an example of structured data because we are looking for a specific, we are able to look for a specific thing and count them. And we are able to look for specific things because they are in the structure format. Then there is unstructured of course. There is a lot of text and so when you're looking at a lot of text, which is not clearly labelled. You need to do different kind of analysis, you need to do different kind of representation. So here's an example of text where you can do sentiment analysis and this is where, you have free flowing text. It's not defined if something is somebody's name. If something is a place, a time, it's just text. And so, we can't simply count or compare. So when it comes to something like sentiment analysis, that's down on this kind of unstructured data. So typically, structured data helps us do a lot of number-driven quantitative analysis, unstructured data we end up doing, you can still do quantitative analysis, there is also a lot of qualitative questions that you can ask. In this course, we'll be focusing on structured data, most times the data that will derive will be of that nature, and so we'll be doing mostly quantitative analysis. So what we did in the subunit, we talked about structure data, which is where the data fields are defined, and the data is organized according to those fields, or labels. So there is a preexisting data model, and unstructured data is when this kind of data model does not exist. There are no predefined organizing scheme. So that's the main difference. And in this course, we'll be mostly dealing with structured data. So that's the end of this subunit.