[MUSIC]
So, in the previous video, we talked about this concept of using Markov
processes to generate random text after we train on some data set.
And so, now we're going to dive into the implementation of this idea.
So by the end of this video, you'll be able to describe the class design
of the Java classes that we're going to be using, both in the video and
in your project, to create a Markov text generator.
And then you'll also be able to implement some of the code that's needed for
the project.
So let's think back to what we said.
So at the beginning, we're going to build a model of the patterns that are implicit
in the data that we want to emulate by training on that data.
And the model that we're building will have these estates that
represent the current situation that we're in.
And then transitions between those states to say how to go forward and
how to generate more and more text.
And then once we've built our model, we're ready to go into that second stage
of the process and actually generate some text.
So let's focus the class design first.
So remember that when we talk about class design we have the notion of interfaces.
And with the interfaces we can specify what's the desired behavior of
our classes without going into the details of the implementation.
So when we have a MarkovTextGenerator
what we want it to be able to do is train on some input data or string.
And then we also wanted to be able to generate text,
perhaps a certain amount of text, and so we want to be able to give it
an integer which says the number of words we want to generate.
Now just to make our objects more useable and
more flexible, we're also going to add the functionality of retraining.
So we want to be able to train on some data and
then maybe generate some text based on that model, then throw it away and
then retrain on new data generate more text etc.
So that’s going to be the interface and
these are the methods that we’re specifying in this interface.
So, if we want to go ahead and implement that interface we need to define a class
that’s going to implement each one of these methods.
And let’s focus on the training piece first.
So, when we train on data, what we're training on is an input string.
And we're going to go back to the example from last time where we have that
Beatles song, Hello, Goodbye.
And so we want to think about our model capturing the quirks and
patterns in that body of text.
So that when we generate new text, we're going to emulate those quirks and
patterns as well.
So for each new word, what we need to do is keep track of the current
word that we're looking at and then what words might follow after that word.
Because our model is going to predict what next word to generate based on the current
state which we are going to think of as the current word.
And so what were looking for are likely consecutive pairs of words, and so
as we process the input string that we are looking at that's our training
set what we are trying to track is which words come after other words so
that we can mimic that later.
So, in order to put that into our class design, our concrete class that implements
the interface MarkovTextGenerator is going to have a list.