HyperText Transfer Protocol is the protocol that browsers use to talk to servers. Now, we've used it for all kinds of other things because it turns out that it's a super-simple and super-elegant protocol. But this goes to a long time ago. This protocol was invented in 1990, which is going on 30 years ago now or more. And there were many protocols. You would have one piece of software and there was server software and client software, you had to match them up, and then you talk to the right port with this server software. And one of the inventions of Tim Berners-Lee and Robert Cailliau at CERN was that we came up with this notion of instead of lots of pieces of software, we would have one piece of software called the web browser, and it would be multiprotocol. And so we have this thing called the URL or Uniform Resource Locator, and we just sling these around, we just use them. But this was itself an amazing innovation in 1990, because in the old days you'd be like, "Use this piece of software and talk to this address." So the URL captured three really important very separate things. So first thing it did is it allowed for multiple protocols, and the one that you mostly have seen is HTTP or HTTPS, that's a protocol, ftp: or mailto: might be the other ones that you've seen, and then a host, and then a document. And that host is a domain name, which is a nice symbolic way to get a server address. And that eventually resolves to an IP address like 141.206.14.22 or something, and then a document within that server and that /page1.htm is the document in the server. And I'm just saying before 1990, there were many protocols and we did many things. But after 1990, this one protocol, HTTP, found its way into so many awesome solutions that it's by far, other than email, the dominant protocol that we use on the Internet. Like I said, it was invented at CERN by Tim Berners-Lee and Robert Cailliau to retrieve documents and images. And as soon as they built it, because most of the other protocols were kind of complex, because we're computer scientists and we make complex things and we love complex things, and we're very, very good at complex things. But this was, when the web was being built, Tim and Robert we're not like, "We're going to make the greatest thing ever. We're just going to make a simple thing." So they made a simple thing, but it turned out to be the greatest thing ever because different engineers would be like, "Oh wow, I can use that in a little different way." And the grounding of it was super simple. And the basic concept was you connect to a server, you figure out where it is, you send a single command with a bit of extra data, and then you get back something, a document, it could be an image, it could be data, it could be HTML. And it was really amazing, right? And so the underlying sockets are the things that make the phone call, HTTP is what we do once this phone call has been established. So one of the things that made the Internet so successful, starting in the 1960s, was a radical openness. A very we're all around the same campfire feeling, very friendly, respectful, that still allowed criticism. And there wasn't like one super genius that designed this or one company that designed this, this was a collaboration. And the collaboration was around a set of open standards and meetings to help build these standards, and they are called RFCs, they are called Request for Comments. And there was a group that built these called the Internet Engineering Task Force, the IETF. And the Request for Comments is a fun notion. We're looking here at a September 1981 document, which is 40 years old about now, and it's called a Request for Comments. And the idea is, as engineers, we might have a design that everyone thinks is the greatest thing ever, but we should always question. No design is perfect. No design is beyond question. No design is beyond commentary. And so even when they're 40 years old and used worldwide and dang amazing, we still say there is room for improvement. And so this talks about, I think this one is IP. And there's an IPv6. There's another one, if you look for this RFC that we're looking at here, there'd be IPv6. And you'd start reading it and it's pretty dense reading, but as an engineer, it gives you all the details to build a router or build a piece of software or build whatever. And these are 100 percent public documents. The idea was that no company, because literally the companies that were around in 1960, if they had said Burroughs or Sperry owns the Internet, or Digital Equipment owns the Internet, they're all out of of business, because they didn't make good choices. But the market was able to make good choices, and because the commons were these open standards and open specifications, new companies like Sun Microsystems and new operating systems like Linux could come in, and even Windows could come in and participate as full participants because they just read these specs and all of a sudden, they're inter-operating with everybody else. This is fundamental and foundational to how the Internet works. And I'm not saying you're supposed to go read them, but it's just fascinating to understand that the entire technological infrastructure upon which the Internet is built is license-free, royalty-free, and you could build a brand new gadget and you could hook it to the Internet by reading these specifications. You don't have to pay a developer's license or nothing. Dang, it's cool. Sorry, I'm just a little bit too excited about that. So let's take a look at one of these in particular. Now this is probably supplanted, but we'll look at RFC 2616, and they go up in numbers. So this is HTTP, HyperText Transfer Protocol. And if you wanted to build a web browser or a web server, you could read this document. Go ahead, read the document. And if you read down far enough, you'll say, "Oh, this is how you make a request." Blah blah blah, it's got a header, and it's got like a carriage return line feed, and then a message body, and you got to do this, and this is what it looks like, this is how it works, right? And so at some point, you're deep in this document and you're finding out, in this page right here of this document, how your browser is supposed to format it. And this tells both the client and the server what the rules of the protocol are. Now, it's lot easier to just for me to tell you. So you connect to the server on a socket, usually port 80 or port 443, and then the client is supposed to make the first it makes first request. It sends a line, with a carriage return line feed at the end that has the characters, there's other ones, but GET, G-E-T, space, a URL, space, and then a protocol. And right now I'm using this HTTP 1.0 because I can do it by hand, right? And so you ask for a document and then you can optionally send some header information. Things like, what language this a person at this browser prefers, what are the capabilities of this browser, what version browser we have. So there's a series of things. You say, give me this document and here's something about the browser that's requesting the document, including information like cookies that are being said. Now so this in case you want to know who is doing this request because you've logged somebody in and set a cookie to indicate that they're logged in, that's all sent on this GET request. But we'll not send the headers. There's actually incoming headers and then outgoing headers. We'll see these when we start looking at Firefox's debugging console. And so this is an entire interaction that you can do on your laptop. Now, I'm not going to do it for you, but I would note that the program telnet that this uses has been removed from most computers. It makes me sad. They think of it as a security hole because it doesn't use encrypted connections, like port 80 is not an encrypted connection. But, ultimately you can install go search on your favorite search engine and find out how to install telnet. But if you get in a command line, whether it's a Windows command line or Mac command line or a Linux command line, and you get telnet properly installed, you can type the following command. telnet data.pr4e.org 80, and then hit Enter. And at that point. So telnet originally was used to log in to computers but if the server we're talking to and the protocol we're talking to is simple enough, and in this case HTTP 1 is simple enough for us to be able to do this, we're actually talking to the server because I remember I said that these are applications. Your browser is not talking to files, your browser is talking to an application on a server and the server might give you back a file. So it's really talking to a piece of software and we'll see this in more detail in next section. And so what'll happen is because the HTTP protocol requires that the client speak first, we have to type. Now if you can get this working and type wrong things, and type a couple enters, newlines, you will see that you're talking to a piece of software and it will say, "You have violated my protocol." You're not talking to a thing that has any online help or user interface or anything like that, you're talking to a server and your job is to request some data using the proper format because you read the specifications about the proper format, right? So in this particular one, it's easier to cut and paste this because some of these servers if you don't type fast enough, they'll be like, "You're not a software, you're a human. Quit playing with me." Like if we talked to Facebook or something. And you can talk to Facebook and see what happens on port 80, and it will time you out really fast. So it's good to have a cut and paste buffer. So you paste this GET command in. And the other thing that's important is you've got to put an extra blank line. And this extra blank line is to indicate that you're not going to send any of those headers if we were going to send things like what language we would prefer or what formats we would prefer as a browser. The browser has certain languages and other configuration and login information and that can get sent up to the web server along with the request for the document, and we would be able to type them in here. But we're not going to do that in this very simple, we're just going to say "Enter" which means no more headers. If you were putting in headers, you'd type header, header, header, header, and then Enter on a blank line. And then the server would know that our request was finished. But we can just ask for a page. And then what the server does is it sends us back two pieces of information. First, it sends us headers. The first header is HTTP 1.1 200 OK. That's actually a status line. Now you'd have to go read the documentation on how what that works. But, for example, 200 means that you got a document. Another one that you might have seen is 404. It might say, 404 Not Found. So if you go to a web page and it doesn't exist, something will say 404 in your browser usually, unless it says, here's a search box, go find it. But there's a status and so you might the thing that you're typing, page1.htm, may or may not be on this server. If it is, then you'll get a 200 OK and the data, or you'll get a 404 and Not Found. And then we get some response data, some header response. and this header response looks pretty much like the same format of the data we would've sent into the browser if we were sending things like that in there. Here it's what the date is, what the server we're using, last modified. And one of the important things is, what kind of content are you about to see? Content type in this case is text/html, which means the thing coming next. And then there's a blank line. And the blank line is our indication as we're reading this data that we have finished the header, which is really metadata about the page we're retrieving, and the actual page itself. And so the rest of this page is the actual data. And we know the format of it because we were told before the page starts. This could be image/png, this could be XML, the application slash XML, and it could be JSON, it could be like anything. And then the browser is supposed to, and if this was an image, it would be garble, garble, garble. garble, garble, right? It would be all garbled stuff that we can't really see. But the browser knows what a PNG or JPEG looks like and it shows it to you. And it basically uses this text/html to tell you what's going on. And so this then your browser has read all this information and has both the metadata and the data about the retrieved page, and then its job is to sort of produce a pretty rendered version of the page and show it to you. And that is the request-response cycle. Again, you don't have to do this. You can just sort of believe that it's done here. But if you want to install telnet on your system, it's not a bad thing to install telnet. It's a classic fun thing. It's far less useful than it was in the early days because we used to use telnet to test everything. Now we've got like browser developer consoles that are way easier than telnet. But it's kind of a cool thing, feel free to do this and you'll be like, I know what's really going on. So, I'm a big fan of using the console, of using the terminal, of using text-based interfaces. I think they're actually way more accessible. I think it's awesome. And in a way, all these fancy graphical user interfaces distract us from the simplicity of what's going on inside of computers because you think, well, where's the button? The answers is, so there's probably some command inside this computer that does what the button does. So I have a former student of mine who actually wrote the scene in I think maybe this was Matrix 2, I think this was Matrix 2, yeah, Matrix Reloaded, this scene where Trinity is breaking into the power grid and she is using the console. And it turned out that the way this scene was written is it was written exactly as a hacker, whether that's a hacker with bad intent or good intent, because there are hackers with good intent who are trying to break into your system so they can tell you or you pay them to break into your system so that they can tell your vulnerabilities. But you can take a look at the analysis of this little scene and just this sort of is just one of the cool things about me teaching people about how to use the command line, and increasingly me teaching people about how to use Linux. I really think that it's okay to know your Mac and Windows command line, but really increasingly the world is in a server world, and we are stating to work on web servers in this class and you might as well start learning some Linux because also that's an important skill. Just writing some Java code or Ruby code or Django code, that's a skill, but knowing how to start the server, debug your application, find log files, etc. in the command line is really important in real production systems because we're going to play in some simplified environments to make it a little easier. But those simplified environments go away once you go into real production in a real job. And so I'm really into teaching you Linux command line, etc., etc. So up next we're going to show you in Python, in effect as few lines as I can possibly show you, how a browser works, how browser sends the HTTP protocol, how a server reacts to the HTTP protocol and returns documents. So that's what's coming up next. [MUSIC]