In this section, we'll be discussing basic statistics. We will define and apply through the use of examples several techniques used to measure central tendency. So, the topics of this discussion is what is statistics? What are the different types of statistics? And what are some ways to measure data using mean, median and mode of a data set? So, what is statistics? It’s a gathering of facts or data, typically numerical that once tabulated or organized can present significant information about a given subject. Let’s move on with some other definitions. An observation is an individual piece of data that you've collected. For example, your home address is an Individual piece of data. A data set is a collection of all observations or home addresses and everyone in your neighborhood, for instance. There are generally two types of statistics, descriptive and inferential. [SOUND] Descriptive statistics involves the organization and summarization of information using charts, graphs, tables, measures of centrality, variation and percentiles. An example might be just to report the percentage of students in this class who own a smartphone just giving a basic number of the percentage of students as an example of a descriptive statistic. Inferential statistics on the other hand involve methods to measure the reliability of conclusions regarding a population based on a sample of that population. You can achieve descriptive statistics from the sample, but it's more involved statistically if you use that sample to infer a conclusion about an entire population. We will get into that a little bit later. An arithmetic mean or a mean is a popular way to describe a data set. We've always called it as an average, haven't we all average our grades in school? The mean is calculated the same way with the sum of the observations divided by the number of observations. The second most common measure of the center of data set is the median. This is the geographic center of the data where half of the data is above the median and the other half lies below the median. We have an example of that later that shows you how to find this number. A mode not to be confused with a la mode is determined by the frequency of how many observations are repeating. If no repetition, then there is no mode. The one value with the greatest frequency is the mode of the database. In case of ties, both are the modes. This example is a data collection of baseball averages from two professional baseball leagues. The seven players on the left represent the American League and the seven on the right represent the National League. We want to know which league has the better batting average. We will also deem additional data from this data set. Let's find the mean for each league using our formula. Let's start by totaling the batting averages from both sides and dividing them by the number of players in each league, which for this example is seven. We can find that the mean batting average for each league is 473 and 305, clearly indicating that the American League has the better batting average. What else can we tell from this table? The median indicates the middle of the data. Rearrange the batting averages, so that their numbers are listed from least to greatest. Since we have seven players, an odd number, it's easy to tell that the fourth one down from the top or the fourth from the bottom is the middle. That's mean that for American league of 500, half of the data lies above 500 and half lies below 500. For the National League, their median is 250 which is lower. But also indicates that half is below and half is above. Notice that the mean shown on the last row would not tell you this information. The mode is the batting average with the highest frequency. We see that 667 is repeated twice for both leagues, so 667 is the mode. If we combine both teams, we find that the 667 batting average is still the mode.