Hello everyone. Welcome to Big Data and Language. Today, let's start using BNC. So let me start how you can search BNC. So let's go to the Google first. Once you go to the Google website, and let's type BNC Corpus. B-N-C Corpus. Are you with me so far? Click. You will see many websites. So let's click the third one, corpus.byu.edu. Let's click it. Okay. If you click the correct website, it looks like this. So BNC stands for British National Corpus. As I mentioned earlier, in the previous lecture, this one contains 100 million words. You can see here 100 million words of the text from a wide range of genres. So let me explain one by one. The first one, you can see the list, chart, collocates, compare, and KWIC. So let me start with the list. Once you click POS. POS, we learn about that, stands for part of speech, such as nouns, verbs, adjectives or adverbs. So once you click, it looks like insert POS. You will see that once you click here, and you will see all different kinds of POS, such as: noun, verb, adjective, adverb. There are more here so depending on your research question, you may want to click the appropriate POS. Let me explain this part, sections. Once you click sections, you will see different registers, such as: spoken, fiction, magazine, newspaper, non-academic, academic, miscellaneous. Even in the spoken data, there are so many different types of spoken data, depending on your research question, feel free to click the appropriate one. Section 1, here, and Section 2, the list are identical. Which means, if you want to compare two different registers, such as you want to compare spoken data and written data, you need to click, the first one is spoken, and the second part will the written, such as: fiction, magazine, newspaper, non-academic or academic, depending on your research question. So let me start with, for example, what will be the most frequent nouns in BNC Corpus? Regardless of genres, regardless of registers. Then, you might want to click here and noun.ALL, you will see that. You see that on their bar, nn*, which means, all nouns. Click Find matching strings. Then you will see all the frequency list of nouns in BNC. In the BNC Corpus, the most frequent noun is time. Then you will see the frequency here. Our second frequently used noun is people, way, years, year, work, government, day, man, world, life, MR, and number. So these are the most frequent nouns regardless of the genres in the BNC Corpus. You also see that here the total number, and also, you will see the chart. If you want to see the example, then you can click the certain word, then you will see that old the examples how though noun government used in the real data. Here you also see the detail about the text file type. So we've learned what will be the most frequent nouns regardless of genres. So this time, you might want to consider the certain registers, such as may be formal and non-formal. So let's click Sections, and let's click one is non-academic, and the other one is academic. So we can compare what kind of nouns most frequently used in non-academic register versus academic register. So once you click it, then we can search, Find matching strings, again. You will see this list. So let me explain one by one. The first one is, of course, the word list. This one is non-academic and this one is academic. The token 1, tokens 1 means, here, section not academic, and tokens 2, means, here, academic. PM 1, PM 2, means, per million words. Because we have the different total words, the number of words in non-academic, and then also the number of words in academic. They are different. The numbers are different, so that's why we want to make a norming. We want to a norm the number of words. So if we have PM 1, per million, in Section 1, and per million in Section 2, then we notice that, we can see that how many words are used in non-academic. For example, let me see, colon here. So PM 2, 41.9. Which means, in academic the noun colons are used 41.9 times in a million words. In academic, we can see that. You might notice that, the words are not familiar or not common words. Why? So let's go back to the search. These words are actually based on two different registers. So the Relevance, Frequency, but if you go Sort/Limit, not Relevance, you might want to check frequency. Even though we compare non-academic and academic, but you just want to know the most frequent words in academic and most frequent words in non-academic register. So there's no comparison or no relevance, so we will consider and we can examine the raw frequency. So let me go back and find the matching strings again. Then now, you see all the common words that you saw before, for example, people. Regardless of the genres, the word people was the number 1, the most frequent noun in the whole BNC Corpus. So that's why, here, because this one is the raw frequencies in non-academic and non-raw frequency in academics, so that's why we can see people here in number 1, the most frequent word, and also, even academic, we see that time is the most frequent word. Based on these lists, we notice that, here, the people actually used a lot in non-academic. However, time was the second most frequent nouns in BNC, as far as I remember. Notice that, here, the noun time used a lot in academic register. Again, you can see that all the frequencies in token 1, and token 2, and also, per million word in Section 1, and per million word in section token 2, and ratio. Let's stop here. I've talked about the BNC, how you can search and how you can access BNC and the first function list. Thank you for your attention.