0:09

While the basics are we want to know how many bits the program use or

bytes, 8 bytes at a time.

And actually we'll be talking in terms of millions of bytes or billions of bytes.

And actually surprisingly there is a controversy about even these basic

definitions.

Computer scientist think of a million bits as 2 to the 20th and

a billion is 2 to the 30th because that's the number of possible things that you

can fit into the 30 bytes and everything is consistent with our calculations.

Other scientists stick to one million and one billion for lot of reasons.

I will usually use a 2 to the 20th to mean a megabyte.

Now on old computers for many years use the 32 bytes machine so

that pointers were four bytes.

1:02

Just in recent years, we've mostly switched to a model where machines

are 64 bits, and pointers are 8 bytes.

That allows us to address much more memory, but pointers use much more space.

And actually, this transition caused a lot of problems initially,

because programs were using way more space than people thought they should.

You're not going to have to go through this kind of transition the way that we

did because 64-bits is definitely enough to address anything that you might

need to address.

Two to the 64th is really a huge number.

1:41

So in terms of bytes we have to start out with typical memory usage.

Now again this is very dependent on machine and implementation but

these numbers are reasonable and are found on typical implementations.

So a boolean, it would be nice if a boolean just took a bit, because it's just

true or false, but actually usually we have to count for a byte for a boolean.

One byte is a byte, character nowadays is 2 bytes, 16 bit characters.

Not that long ago, we used 8 bits for chars.

Integer, a regular int is 4.

Bytes or 32 bits and float is also 4 bytes, long int is 8 and a double is 8.

Usually we use doubles for a floating point and ints for

integers in most applications.

So that's for primitive types.

And then for arrays there is a certain amount of overhead for making an array and

then if there's N items it's whatever the cost of the primitive type times N.

So, in array of doubles is say 8N + 24.

And two-dimensional array.

Then well, we can go ahead and compute the exact thing,

but now it's time to use the tilde notation.

Even for arrays, we could say a double is tilde 8N, for one-dimensional.

For two-dimensional, two-dimensional array of doubles is tilde 8 M N.

And there's extra terms for the overhead but for

large M and N that's going to be pretty accurate.

So that's our basic usage for primitive types and

arrays in a typical job implementation.

3:50

So for example if you had a date object that had three int instance variables,

then that object would take a total of 32 bytes each int takes 4 bytes.

Object overhead is 16 bytes.

I need 4 bytes for padding, so it's a total of 32 bytes.

4:11

So, and the other one that often comes up is a string.

And a string is a little bit more complicated than an array but

the typical implementation of a string in Java has a reference

out to an array of characters and then it's got int values for

offset, count and hash value and then some padding.

And adding it all together.

The [COUGH] cost of a string is about two N plus 64 bytes.

4:48

So these are the basics that we need to analyze

the memory usage for a typical Java program.

So for data type value, if it's a primitive type it's 4 for an int,

8 for a double and so forth.

If it's a reference it's going to be 8 bytes, that's what a pointer takes.

Array, 24 bytes plus the memory for each entry, and an object, 16 bytes.

Plus the memory for the incidence variable plus if there's an inner class that's

another eight bytes as we talked about with nodes for linked lists, and

then there's the padding.

5:27

So then we have to think about who's responsible for

referenced objects In some cases.

And we'll take care of that when we get to these situations.

So as an example, a simple example of memory use analysis,

let's take a look at how much memory our WeightedQuickUnionUF

function from a few lectures ago uses as a function of N.

And there's only a couple of memory elements, and

each one of them are easily analyzed using the basics that we just gave.

It's an objects so the 16 bytes of object overhead.

There's two int arrays.

Each one of them have an array overhead

of 24 plus 4N for the N entries.

Each of the N entries takes four bytes.

And then there's four bytes for the count, and there's four bytes for the padding,

and if you add it all together.

It's 8N + 88 which tilde 8N.

Again, all that saying is when N is large all we are going to care about

in terms of analyzing the memory is that we've got [COUGH] two N integers.

7:06

We can do it with empirical analysis where we actually execute the programs to do

experiments and use assume power law, formulate hypothesis and make predictions.

But we can do more, we can do mathematical analysis where we can identify the most

costly operations, analyze the frequency of execution of those operations.

And using the tilde notation to simplify analysis,

we can actually explain the behavior, not just predict it.

And this is a fine example of the use of the scientific method to

understand artifacts that we're studying, algorithms.

Our mathematical models are usually independent of a particular

computer system and even applied to machines that are not yet built.

But we always validate our mathematical models by running experiments on real

machines, so that we can be confident when we're making predictions and

analyzing algorithms.