[SOUND] This lecture is about how to mine text data with social network as context. In this lecture we're going to continue discussing contextual text mining. In particular, we're going to look at the social network of others as context. So first, what's our motivation for using network context for analysis of text? The context of a text article can form a network. For example the authors of research articles might form collaboration networks. But authors of social media content might form social networks. For example, in Twitter people might follow each other. Or in Facebook as people might claim friends of others, etc. So such context connects the content of the others. Similarly, locations associated with text can also be connected to form geographical network. But in general you can can imagine the metadata of the text data can form some kind of network if they have some relations. Now there is some benefit in jointly analyzing text and its social network context or network context in general. And that's because we can use network to impose some constraints on topics of text. So for example it's reasonable to assume that authors connected in collaboration networks tend to write about the similar topics. So such heuristics can be used to guide us in analyzing topics. Text also can help characterize the content associated with each subnetwork. And this is to say that both kinds of data, the network and text, can help each other. So for example the difference in opinions expressed that are in two subnetworks can be reviewed by doing this type of joint analysis. So here briefly you could use a model called a network supervised topic model. In this slide we're going to give some general ideas. And then in the next slide we're going to give some more details. But in general in this part of the course we don't have enough time to cover these frontier topics in detail. But we provide references that would allow you to read more about the topic to know the details. But it should still be useful to know the general ideas. And to know what they can do to know when you might be able to use them. So the general idea of network supervised topic model is the following. Let's start with viewing the regular topic models. Like if you had an LDA as sorting optimization problem. Of course, in this case, the optimization objective function is a likelihood function. So we often use maximum likelihood estimator to obtain the parameters. And these parameters will give us useful information that we want to obtain from text data. For example, topics. So we want to maximize the probability of tests that are given the parameters generally denoted by number. The main idea of incorporating network is to think about the constraints that can be imposed based on the network. In general, the idea is to use the network to impose some constraints on the model parameters, lambda here. For example, the text at adjacent nodes of the network can be similar to cover similar topics. Indeed, in many cases, they tend to cover similar topics. So we may be able to smooth the topic distributions on the graph on the network so that adjacent nodes will have very similar topic distributions. So they will share a common distribution on the topics. Or have just a slight variations of the topic of distributions, of the coverage. So, technically, what we can do is simply to add a network and use the regularizers to the likelihood of objective function as shown here. So instead of just optimize the probability of test data given parameters lambda, we're going to optimize another function F. This function combines the likelihood with a regularizer function called R here. And the regularizer defines the the parameters lambda and the Network. It tells us basically what kind of parameters are preferred from a network constraint perspective. So you can easily see this is in effect implementing the idea of imposing some prior on the model parameters. Only that we're not necessary having a probabilistic model, but the idea is the same. We're going to combine the two in one single objective function. So, the advantage of this idea is that it's quite general. Here the top model can be any generative model for text. It doesn't have to be PLSA or LEA, or the current topic models. And similarly, the network can be also in a network. Any graph that connects these text objects. This regularizer can also be any regularizer. We can be flexible in capturing different heuristics that we want to capture. And finally, the function F can also vary, so there can be many different ways to combine them. So, this general idea is actually quite, quite powerful. It offers a general approach to combining these different types of data in single optimization framework. And this general idea can really be applied for any problem. But here in this paper reference here, a particular instantiation called a NetPLSA was started. In this case, it's just for instantiating of PLSA to incorporate this simple constraint imposed by network. And the prior here is the neighbors on the network must have similar topic distribution. They must cover similar topics in similar ways. And that's basically what it says in English. So technically we just have a modified objective function here. Let's define both the texts you can actually see in the network graph G here. And if you look at this formula, you can actually recognize some part fairly familiarly. Because they are, they should be fairly familiar to you by now. So can you recognize which part is the likelihood for the test given the topic model? Well if you look at it, you will see this part is precisely the PLSA log-likelihood that we want to maximize when we estimate parameters for PLSA alone. But the second equation shows some additional constraints on the parameters. And in particular, we'll see here it's to measure the difference between the topic coverage at node u and node v. The two adjacent nodes on the network. We want their distributions to be similar. So here we are computing the square of their differences and we want to minimize this difference. And note that there's a negative sign in front of this sum, this whole sum here. So this makes it possible to find the parameters that are both to maximize the PLSA log-likelihood. That means the parameters will fit the data well and, also to respect that this constraint from the network. And this is the negative sign that I just mentioned. Because this is an negative sign, when we maximize this object in function we'll actually minimize this statement term here. So if we look further in this picture we'll see the results will weight of edge between u and v here. And that space from out network. If you have a weight that says well, these two nodes are strong collaborators of researchers. These two are strong connections between two people in a social network. And they would have weight. Then that means it would be more important that they're topic coverages are similar. And that's basically what it says here. And finally you see a parameter lambda here. This is a new parameter to control the influence of network constraint. We can see easily, if lambda is set to 0, we just go back to the standard PLSA. But when lambda is set to a larger value, then we will let the network influence the estimated models more. So as you can see, the effect here is that we're going to do basically PLSA. But we're going to also try to make the topic coverages on the two nodes that are strongly connected to be similar. And we ensure their coverages are similar. So here are some of the several results, from that paper. This is slide shows the record results of using PLSA. And the data here is DBLP data, bibliographic data, about research articles. And the experiments have to do with using four communities of applications. IR information retrieval. DM stands for data mining. ML for machinery and web. There are four communities of articles, and we were hoping to see that the topic mining can help us uncover these four communities. But from these assembled topics that you have seen here that are generated by PLSA. And PLSA is unable to generate the four communities that correspond to our intuition. The reason was because they are all mixed together and there are many words that are shared by these communities. So it's not that easy to use four topics to separate them. If we use more topics, perhaps we will have more coherent topics. But what's interesting is that if we use the NetPLSA where the network, the collaboration network in this case of authors is used to impose constraints. And in this case we also use four topics. But Ned Pierre said we gave much more meaningful topics. So here we'll see that these topics correspond well to the four communities. The first is information retrieval. The second is data mining. Third is machine learning. And the fourth is web. So that separation was mostly because of the influence of network where with leverage is a collaboration network information. Essentially the people that form a collaborating network would then be kind of assumed to write about similar topics. And that's why we're going to have more coherent topics. And if you just listen to text data alone based on the occurrences, you won't get such coherent topics. Even though a topic model, like PLSA or LDA also should be able to pick up co-occurring words. So in general the topics that they generate represent words that co-occur each other. But still they cannot generate such a coherent results as NetPLSA, showing that the network contest is very useful here. Now a similar model could have been also useful to to characterize the content associated with each subnetwork of collaborations. So a more general view of text mining in context of network is you treat text as living in a rich information network environment. And that means we can connect all the related data together as a big network. And text data can be associated with a lot of structures in the network. For example, text data can be associated with the nodes of the network, and that's basically what we just discussed in the NetPLSA. But text data can be associated with age as well, or paths or even subnetworks. And such a way to represent texts that are in the big environment of all the context information is very powerful. Because it allows to analyze all the data, all the information together. And so in general, analysis of text should be using the entire network information that's related to the text data. So here's one suggested reading. And this is the paper about NetPLSA where you can find more details about the model and how to make such a model. [MUSIC]