0:23

So we'll be covering several of the basics that are considered to be basics

in computer science.

Typically some of these concepts are concepts that computer science

students learn in their first couple of years in a computer science degree,

undergraduate degree program.

So for those of you who are already familiar with this material this could be

a refresher, or you could just, skip it if you already know the stuff.

In any case you can always use this orientation lecture as a reference

to keep coming back to if you encounter a term in regular

lectures that doesn't make sense to you, or that you'd like to be reminded about.

1:17

So basically, the instructions will discuss two different data structures.

The first one is a queue.

A queue is essentially a First-in First-out data structure.

Elements of the queue could be any data types.

In this example here, I have elements that are integers, so

in this example right now, the queue has five elements, 3, 5, 8, 0, 7.

This is an ordered list of elements.

When you remove and element from the queue,

you remove it from the head of the queue.

In this case the head is the element, 3.

When you insert a new element, you insert it at the tail of the queue which means

that if I insert a new element,

say 1, it's going to get inserted right after 7 in this queue.

2:00

So, for instance, given this particular queue, consisting of five elements.

If you, if you dequeue or remove an item then that item will be 3.

At that point of time, the queue will only consist of 5, 8, 0,

and 7, with the head pointing to 5.

The next item that you dequeue or remove, will be 5.

After that the queue will cut to 8, 0, and 7.

And then the next item dequeued will be 8 and so on and so forth.

Dequeues or removals as well as insertions can occur concurrently.

You can remove an element while at the same time also insert

an element at the table.

2:36

So, that's a queue, which is a first-in, first-out datastructure.

A different data structure is the stack,

which is a First-in Last-out datastructure.

Imagine a stack of dishes that you place on your table.

The dish that you put on the top, that you add the last,

will be the first one that you can remove.

All right, and that's essentially a stack.

So remove, or also known as a pop, happens from the top.

An inserter push also happens at the top.

Okay, so in this case if you push a new element into this

particular stack say 9, then 9 will go above 3.

After that, if you pop or remove an element,

that's going to be the last element that you added, which is 9 in this case.

The next pop after that will get 3 because 9 was just removed and

3 is now the top of the stack.

3:21

After you're able to move three the next pop or removal will get you 5, and

so on and so forth.

After that you'll get 8 and then 0 and 7.

Okay, so, once again, the removal always returns you the last item that was added

and insertion or a push adds an item right to the top of the stack.

So these two data structures, queue and

stack are used very widely in computer science.

And in fact, we'll be using the notion of the stack right away when we discuss

processes soon in this particular orientation lecture itself.

3:51

What is a process?

A process is essentially a program in action.

For instance, you might write a program in a language like C, C++, Java.

Are you know Python or Ruby on Rails.

Whatever it is, it doesn't matter what the language is.

And when you execute that program, say from the command line or

from a gui, that executing program is called a process.

For instance here I have C like program.

Which has a function main.

Which in turn call a method f1.

Which in turn call a method f2.

This is shown pictorially here.

A main calling f1, calling f2.

When you instant shared this program and

execute it, it becomes a process, which in this case is labeled as P1.

So this code part showing here is only the program itself,

the process itself consists of multiple other components.

The code typically doesn't change but the other components might change.

What are these other components?

Let's look at them one by one.

First of all there is a program counter,

which is an index into the code which says where the program is right now, what is

the exact line number within the code where the program is executing right now.

Next you have a stack,

which is used by the functions to pass arguments and return values.

For instance when main,

main function calls F1 any arguments that it needs to pass to F1 are call,

are passed by the stack by pushing an element on top of the stack.

F1 can then pawn that element off and

obtain the arguments that are passed to it.

Similarly when F1 calls F2 it then pushes on top of the stack, the arguments that

it has for f2 and f2 can retrieve these arguments by popping the top of the stack.

When f2 is done, it can return the return values to f1 also by pushing

the return values on top of the stack.

So that when f1 resumes,

it can pop the top of the stack to obtain the return values from f2.

Similarly, f1 returns its return values to me by pushing on top of the stack.

Now, the functions themselves might declare variables.

Some variables will be local variables.

And local variables are also to begin maintaining on top of the stack.

I'm not showing them here.

And this is the reason why when a functional method is completed

in its execution, any local variables that are declared inside are then gone,

because that part of this tack has been popped.

In addition function, but also create new memory by, for instance,

calling malup or new.

In this case I'm showing such a variable x.

And such variables are stored in a separate data structure,

associated with the process known as the heap.

6:14

Also associated with a, a heap are also registers, which typically refer to

the recently accessed variables in the in the process itself.

So, for instance, this variable x is is present in the heap.

Such or new variables need to be explicitly freed in most cases.

However some programming languages have automatic garbage collection.

Now the reason we are discussing a process here is

really the main reason is that when we talk of distributed systems,

we are going to talk of multiple processes communicating with each other.

And we'll mostly focus on a process to process communication,

although later on we'll also talk about how procedure calls or

function calls can cross process boundaries.

In addition, the other reason we are discussing a process here is that when you

take a snapshot of a process, when I say a process takes its own snapshot,

this essentially includes everything that I'm showing here on the slide and

maybe a few more elements.

It includes the code, the program counter, the current status of the stack.

The current status of the heap and the registers and then a few other elements.

Essentially, this information is enough to resume this process from here onwards.

Okay speaking of computer architecture,

let's discuss a simplified version of computer architecture.

Computer architectures today can be very complicated but typically when we,

when we think of how a process executes on a computer architecture.

Again, think of a simplified view.

So there's a CPU which executes the actual instructions which are present in

your code.

There are also a bunch of registers that are co-located with the CPU.

These are small pieces of memory that can be accessed very quickly by the CPU.

Typically there's only a small number of registers.

Typically not more than a few tens of registers.

There's also a cache which is a slightly larger memory than the collection of

registers.

And the larger a mem, piece of memory is the slower it is to access it, so

accessing a cache is slower than registers.

Than accessing registers, but accessing the cache is still pretty fast.

Then beyond the cache there is the main memory or the random access memory or

the RAM which is even larger than the cache and the ca, and the memory itself,

therefore, is slower than the cache, but still it's pretty fast.

And then finally, there's a hard disk, or if you would like,

a solid stave, state drive, a flash drive or whatever.

Which is a lot more memory than the main memory but of course it's much slower.

So as you go up from disk to memory to cache, registers the speed increases.

The total amount of storage space that you have decreases.

8:35

So when you write a program such as C++ or Java or C and you compile it,

it get compiled to low-level machine instructions.

The machine instructions might be specific to the architecture the machine on which

you are running or it might be a virtual machine like a JVM kind of code.

Any case these low-level instructions are the executable version of your program.

And this is stored in file system in the file system, on, on you desk.

When your program starts executing, when it becomes a process.

The CPU loads instructions in batches or in a simplified,

more simplified way, it loads instructions, one by one.

9:47

Once in a while, you may need to store something on disk so

that you have a permanent storage so in case the process goes offline.

Then you still have some data.

Say for instance, the program might open a file and write something into the file.

So in this case you may want to write those updates to the file onto the disk.

Now of course this is a very highly simplified picture.

Computer architectures can be far more complex than this today, but for

our practical purposes in this course and as far as explaining how processes work.

This will suffice.

10:20

So, next, we move on to a few mathematical topics.

The first mathematical topic is a big O notation.

A big O Notation is one of the most basic ways of analyzing algorithms.

When I give you an algorithm, and I say, analyze the algorithm, mathematically.

Essentially I expect you to come up with, at the least a Big O notation and

analysis of that algorithm.

The, the Big O notation describes and upper bound and

the behavior of an algorithm as some variable is scaled or

increased in value to all the way to infinity.

Typically, this variable is the size of the input.

Maybe the number of input elements.

11:36

So essentially when I say an algorithm A is order of foo, this means that algorithm

A takes less than c times foo time to complete, as you mean time is the metric

we are measuring, for some constant c, a fixed constant c, beyond sum input size N.

Okay, typically this foo that is the present of this definition is

some function of input size N.

For instance, I could say that an algorithm is order N

which means that the algorithm A takes less than C times N, time to complete for

some constant C be on some input size N.

So the algorithm might be order N squared which means that.

It is quadratic in time.

And it takes time less than c times n squared for some constant c.

Once again, when we say O(N2) or O(N),

notice that we do not include the constant c in the definition itself.

This is because the constant is irrelevant.

As long as there is a fixed constant, some fixed constant.

That is good enough for us as far as Big O() notation is, concerned.

Okay, so we do not state the constants in Big O() notation.

12:35

So let's see a couple of examples.

So first I give you an unsorted list, of, say, integers, and I ask you to search for

a specific element in the list.

This essentially means that you have to iterate through the list,

one element at a time, and search for

the element by trying to match it with each element already present in the list.

What is the worst case performance?

That's what we need, for bigger notation, right?

So the worst case performance happens either when the element is not there,

at all in the list, so you return a false.

Or the element the very last one in the list,

the very last one that you match against.

Therefore the worst case performance involves N operations, and

the number of operations involved is less than C times N.

Where c equals 2,

because the number of operations is N, where N is the size of the list.

Okay, so, the time as well as the number of operations,

assuming each operation takes one unit of time, both the time taken for

this, for this algorithm, which I trace through the list.

As number of operations are both order N.

That's why we say order N over here.

13:34

Let's take a different example to see how big O notation works.

This leads with insertion sorting of an unsorted list.

So suppose I give you a list of elements say integers, and this is unsorted and

I ask you to solve them say, in increasing order of the elements.

The insertion sort algorithm works as follows.

First you create a new list that is empty.

Then you iterate through the elements in the unsorted original list.

You remove one element at a time, and

you insert that element in the new list at the appropriate position.

Essentially, you sort through the new list linearly, searching for

the right position where this element should be.

14:31

Similarly the third element will take three operations to insert.

Where the first two elements compared with the two already existing elements.

And the third operation inserts it at the very end of the of the list.

If you go on like this and the i-th element takes i operations to insert.

And you can calculate the total time required.

To insert all the elements as 1 plus 2 plus 3, so on til N.

And this is a well-known sum which comes to N times N plus 1 divided by 2.

Which is, less than 1 times N squared.

So with the value c equals 1, this,

insertion sorting example is an algorithm that takes order N squared in time and

also order N squared in number of operation.

15:15

Next, we look at a little bit of basic probability.

Before probability, we need to discuss a few definitions.

The first definition is a set.

A set is essentially a collection of things.

So, for instance, a set, S,

might be a set of all humans who live in this world at this current point.

A subset is another set,

which is a collection of things that is a part of the larger set.

So I might design, describe a set S two which says a set of all humans

who live currently in Europe today.

S two is a subset of S because, every element that is present in S two is

also present in S but the reverse is not necessarily true.

15:52

Okay, so now probability.

Any, any event has a probability of happening, 'kay?

So let me give you an example.

If you wake up at a random hour of the day.

So you just wake at some random hour of the day.

Say you have jet lag or something.

And I ask you what is a property that the event at the time is between 10 a.m.,.

. and 11 a.m.

Let's try to calculate this.

Well, there are 24 hours in a day.

So, you have a set that consists of 24 elements, one for each hour.

12 a.m.,

. 1 a.m.,.

. 2 a.m.

All the way to 10 a.m.,

. 11 a.m.,.

. 11 p.m.

Okay, so when I say, and I limit like 1 a.m.,

. I mean the hour between 1 a.m.,.

. and 1:59 a.m.

Similarly, 10 a.m.

Means between 10 a.m.

And 10:59 a.m.

Out of the set upgrade for elements, you pick one at random.

And the probability that you're going to pick 10 a.m.

Is essentially 1 divided by 24, because you're picking one element at random.

And there are 24 elements in the set.

And the probability that you pick that one special element 10 a.m.

Is one over 24 because you, you want to pick, you want to be lucky.

Of that, to pick that ten.

So the probability that if you wake up at a random hour of the day and

the time is between 10 a.m.

And 11 a.m.

Then that probability is 1 over 24.

Now there will be multiple events and you might need to calculate the probability of

conjunctions or disjunctions of these events.

I'll describe what these are soon.

So say E1 is one event and E2 is another event.

And E1 and E2 are independent of each other.

This means that E1 does not influence E2, E2 does not influence E1.

Then the probability that E1 and

E2 both happen is essentially the multiplication of these probabilities.

So you take the probability of E1.

Multiply it by the probability of E2, and

that gives you the probability that both E1 and E2 are true.

Let's discuss an example of this.

Suppose you have three shirts.

They're colored blue, green and red.

And also you wake up at a random hour and blindly pick a shirt.

You put your hand into your closet and you blindly pick out a shirt.

What is the probability that you woke up between 10 a.m.

And 11 a.m.

And that you picked a green shirt to wear?

Well the probability that you woke up between 10 and

11 is essentially 1 over 24.

Like we calculated on the previous slide.

The probability that you picked the green shirt is essentially one out of three

because there are three colors of shirts.

You have three shirts.

And you want to pick the green one.

You want to be lucky to build the green one so that's one over three.

So the probability that both these elements are true that you woke up between

10 and 11 and that you wore the green shirt,

is the multiplication of the probabilities of those events.

So it's 1 over 24 times 1 over 3.

And that gives you 1 over 72.

One of the things that you have to be careful about here is that if E1 and

E2 are not independent, meaning that they influence each other in some way.

Then you can not multiply the probabilities with each other.

Okay, so for instance if

18:54

A different thing that we need to do with two event probabilities is to OR them.

So if I give you two events, E1 and E2, and

I ask you the probability that of either E1 or E2 happening.

Then you can see that the probability of E1 or E2 is the probability of E1 plus

the probability of E2 minus the probability of E1 and E2.

Okay, so this is the intersection of probability here.

If you do not now the probability of E1 or E2.

This last term over here.

Then you can write the inequality probability of E1 or E2 is less than or

equal to the constituent probabilities.

Okay.

Sometimes we use these inequalities to upper bound the,

properties of the of these junctions as we see here.

DNS refers to Domain Name System.

It's essentially a named resolution system.

It is run as a collection of servers throughout the world,

and it helps you to translate URLs, actually domain names into IP addresses.

So the input to the DNS is a domain name so just Coursera.org and

this is human readable string that uniquely identifies the object.

So we humans can easily remember a name like Coursera.org.

Actually we do need to remember URLs like class.coursera.org/foo and when you enter

this into your browser, your browser extracts the domain name from this URL and

then sends this off as a query to the DNS system.

20:11

The DNS system then returns to you the IP address of a web server

that potentially hosts that content.

And an IP address is essentially a target, or an address, or

an ID, and it is a unique string that points to the object.

The difference between an ID and a name.

The output and the input to a name resolution system like DNS is

essentially that IDs are harder to remember.

It's harder for us human beings to remember IP addresses.

It's much easier for us to remember names.

Also, IP addresses change, but names don't.

Now, IP addresses might refer, might refer to a actual web server,

a unique web server that actually hosts content, so that now your client.

A browser can send a, DCPN ACDP request to that server.

Or it might refer to an indirect server or even a group of servers,

such as in a content distribution network, as it is from Akamai.

The fun thing about name resolution systems is that they can be changed, so

the ID or the address obtained from one name,

named resolution system can be given as a name to another named resolution system,

and that can return a different address, and so on and so forth.

And you can keep doing this until you, reach your final object itself.

21:14

Graphs.

So when we say graphs in computer science, we don't necessarily mean plots,

which have x and y axes.

A graph is essentially a network.

So here I am showing you a graph of some cities, Amsterdam, Boston,

Chicago, Delhi, and Edmonton.

And the travel times between those cities by air.

Rounded out to the nearest hour.

Okay, so travelling from Amsterdam to Boston takes about six hours.

Travelling from Chicago to Delhi takes about 14 hours.

Say these are the flights available on some particular airline.

22:12

Also when I say that A is adjacent to B,

this means that A has an edge that connects it to B directly.

Typically an edge only goes between two nodes or two vertices.

An edge does not cross three nodes.

So here for instance I have one, two, three, four,

five, vertices and how many edges do I have?

I have an AE edge, an AB edge, a BE edge, a BC edge and a CD edge,

so I also have five, different edges each with its own, edge weight.

22:46

So, in this lecture, we have covered the basic datastructures,

such as a cube and a stack.

We have seen that processes are programs in action, and

that they consist of multiple different pieces.

Processes, of course, are done under computer architecture, and

it's important for you to know at least.

A simplified version of the computer architecture, so

that you know what's operating where when you run a process.

We've seen some mathematical topics such as big O notation which analyses the worst

case performance of any given algorithm, and in some basic probability.

And finally, we have seen a few miscellaneous topics.

I hope you can come back, keep coming back to this as a reference,

whenever you need it throughout the course.

[MUSIC]