Hello everyone. So in this module we'll give an introduction to data and data processing.
So, basically, we're going to explain what kind of data models that exist,
what kind of data exist,
and how can we process this data and manage it.
First let's see, what data is.
So if you look at the world we're living in today,
so we have tons of users interacting with social media,
we have billions of mobile devices,
and it's projected that we'll have like trillions of sensor devices
installed in traffic intersections and to monitor the environment,
we have this data collected from homes of using the smart city technology,
and we have smart cars,
we have all and wearable devices,
so there's tons of devices that sense and monitor
the environment and these collects data about the physical world.
So as you can see from that.
So the data that we collect is huge,
and we need some kind of system,
or set of systems that helps us understand this data and make sense of it.
So, the idea is that,
we have seen that we have data.
So that's a reality but, as we mentioned,
we need also some kind of system that helps us process this data,
to get some information,
to extract knowledge or information from this kind of data.
So, the transition from the data to information,
there is something here in the middle that needs to be implemented,
and be in place to help us extract this knowledge from this data.
So, but to do that,
let's see first, what kind of characteristics that this data have.
So, and this will bring us to talk about big data.
What do we mean by big data here is that,
based on the application and the devices
that we've seen collecting data from the environment,
the volume of available data is actually really big.
So, if you have billions of devices and people interacting with the data,
and we're interacting with these devices on actually these days on a second basis.
So, this means that we're collecting data from these devices,
maybe in more than even the scale of a Petibyte per day.
So, that's a huge volume of data.
And if you don't have the infrastructure to deal with this volume,
you're not going to be able to make sense of this data and extract
knowledge from it and also build application on top of this data.
It's not only about the volume,
also, the data is growing at a staggering rate.
And what we mean by this is that,
data keeps coming in from these devices that monitor the environment for instance,
at a very high rate.
And, that means that we need also systems,
that can digest this high rate of data.
Another challenge also that comes with big data is
the variety of the sources this data is coming from.
So as you mentioned in,
as we've seen in this slide here.
So these are just a set of sources examples of sources that the data can come from.
And these are all different sources.
They're not interconnected.
So the variety of sources adds also a challenge of how to integrate this data.
So, that's also a really big challenge of
how can we deal with various sources that the data is collected from.
And it definitely comes from the sources that the data come from,
it creates or generates the data in totally different formats.
So integrating this data together to build
an application on top or to extract knowledge is also a challenge.
So, what we want here is to build systems,
data processing systems, data management systems,
that can cope with the volume,
velocity and variety aspects of the big data,
that we've just seen.
So, what we have these days,
we have a plethora of data processing systems,
and some of them are very classic,
some of them are relatively new.
So we have the classic relational database system,
and a relational database system,
we're going to talk about it later.
But the main idea here is that it presents entities and
objects in the world using tables and relations between these tables.
And this has been used for an operational workloads,
like for example in a banking system like flight reservation systems,
libraries, to store the data and allow users to retrieve information from this data.
There are also some no sequel,
and these are relatively new,
no sequel data systems.
And this is basically move from
the relational model and works on data that is actually totally unstructured.
So the data is not in a tabular format as it's used in the relational database.
It can come in,
like for example graph format,
it can come is a textual like documents formats,
it can come into a more relaxed format which is key value store.
So, it runs queries or applications that extract knowledge from the data,
but it works on unstructured data.
There are other kinds of systems,
also other family of systems that focus mainly on analytics.
So they're not operational,
like they don't work on operational workloads or like transactional workloads,
but they focus mainly on how can we run
large scale analytics applications on top of massive scale data.
So, in the course,
we're going to be covering several aspects of each of these types of systems.