And so you could do regular expressions, you could do finds, you could do splits,
you could do all kinds of things and you would find quickly it'll work for
the first two pages but the third through the thousandth page blows up because.
I didn't realize that's what the href,
I didn't realize you could put a newline there.
Who would have put a newline there, why would they do that, and then you fix that.
And you realize after you try to fix this and you try to parse all the links that
there is just so many variations and someone has already done that.
And it's a library called BeautifulSoup from a place called crummy.com
I think the naming of this is all sort of this tongue in cheek of what
a mess HTML is.
And so instead of calling it HTML super parser, they just call it something silly
because it's kind of a silly problem because HTML on the web is just so bad.
And so it's kind of fun.
But once you use this
you've taken the easy way out.
Now, the first thing that you've got to do is you've got to install BeautifulSoup and
there's two ways to do it.
I put in the code, the code folders, and all the zip files.
So you can either go and follow the instructions at their site and
install it, and that installs it for all Python programs.
That's fine if you can figure that out but sometimes you're running on
a campus computer and you can't actually reinstall software on that, maybe you're
bringing your python programs on a USB stick or on a shared drive or something.
And so you actually kind of have to locally install it and so if you make this
file, you actually can download this file bs4.zip and then unzip it.
And so if you put it in a folder with the URL links, this one.