And this is just a net sum of all the counts.
And this further allows us to then solve the optimization problem,
eventually, to find the optimal setting for theta sub i.
And if you look at this formula it turns out that it's actually very intuitive
because this is just the normalized count of these words by the document ns,
which is also a sum of all the counts of words in the document.
So, after all this mess, after all,
we have just obtained something that's very intuitive and
this will be just our intuition where we want to
maximize the data by assigning as much probability
mass as possible to all the observed the words here.
And you might also notice that this is the general result of maximum likelihood
raised estimator.
In general, the estimator would be to normalize counts and it's just sometimes
the counts have to be done in a particular way, as you will also see later.
So this is basically an analytical solution to our optimization problem.
In general though, when the likelihood function is very complicated, we're not
going to be able to solve the optimization problem by having a closed form formula.
Instead we have to use some numerical algorithms and
we're going to see such cases later, also.
So if you imagine what would we get if we use such a maximum
likelihood estimator to estimate one topic for a single document d here?
Let's imagine this document is a text mining paper.
Now, what you might see is something that looks like this.
On the top, you will see the high probability words tend to be those very
common words, often functional words in English.
And this will be followed by some content words that really
characterize the topic well like text, mining, etc.
And then in the end, you also see there is more probability of
words that are not really related to the topic but
they might be extraneously mentioned in the document.
As a topic representation, you will see this is not ideal, right?
That because the high probability words are functional words,
they are not really characterizing the topic.
So my question is how can we get rid of such common words?