[MUSIC] Hi, in this module, I'm going to talk about the most basic form of quantitative analysis, the construction and interpretation of tabulations. I'll be focusing on issues related to the presentation of tabulations. So you maximize their accessibility to an audience and their ability to help clarify comparisons within your data. So tabulations are a key tool in quantitative social science and especially in the communication of results. They are most useful for examining relationships among categorical variables. Here I've presented tabulations of the people in the United States in 1950 and 2010 according to their level of education. I'm deliberately starting out with an unappealing table, one that we wouldn't use for publication in order to show you, step by step, what the key features are that should be included in a table or tabulation. Again, to help maximize clarity and facilitate comparison. In practice, we would not present a table like this in any publication. We would refine it. According to the steps that I'm about to lay out, the first thing that a table needs is a title. I'm still surprised that the number of papers that I read transmitted by students with tables that don't even include a title. The title should make the content of the table very clear so that a properly titled table should be able to stand on its own. The meaning of the table, the impact or the intent of the table should be clear from the title without the reader, the audience having to refer to the text or rely on additional explanation. At a basic level, the title should specify the coverage. What's the population being described in the table? Here we have Adults Aged 25 and Above in the United States, 1950 and 2010. It should also specify the variables. So we're looking at Educational attainment in these two years, in 1950 and in 2010. Tables should normally include a note that explains the source of the data. And then, every title normally should include totals. In this case, we've added in the grand totals for the numbers of adults age 25 and above in both of those years. So that the contents of the cells sum up to those raw totals. Again I'm always surprised that the number of tables that I see, sometimes in publication or in manuscripts submitted to journals that don't include total loss. Some times by default if you are just using word or some other word processor, there is a tendency to follow the default settings and produce a table that has a grid of lines going in very direction, up-down, left-right. I would advocate that you, for publication and presentation, use lines sparingly. Vertical lines should be avoided completely. If you look at the guidelines for most publications for manuscripts to be submitted they don't want vertical lines in tables that are submitted as part of their manuscripts. We use horizontal lines to set apart the major parts of the table, separate the title from the column headings and the rows themselves with the contents of the table, the total row, and then the notes. Now most of the time we don't actually present tables that are just raw counts of, in this case, people. They're not very useful. It's hard to interpret any trend or pattern that might be buried in that table. It's very hard to just look at this table and draw any meaningful conclusions about differences in the educational distribution of the United States between 1950 and 2010. So usually, if we're preparing a table, we want to percentage the table. Percentages are generally more useful than raw numbers. The choice of the direction for how we compute our percentages should be based on the comparison that's being made. In our case we're comparing comparing the educational distribution of the United States in 1950 and 2010. So columns that is distributions within years these percentages will facilitate comparison. So if we compute our percentages of people within each year. We have different amounts of education and then we compare those percentages across these two years. We'll get some insight into changes in the educational distribution of the United States. Here we've already converted those raw numbers into column percentages. So we see very clearly now that there's been a big change in the educational distribution of the United States between 1950 and 2010. Back in 1950, 80% of the population had a Less than High School education. By 2010, only 6% of the population had such an education. Meanwhile the shares of people who are high school graduates had some college or are college graduates has increased tremendously. So here we're showing the direction in which the percentages have been computed. Again, we're computing percentages within columns so that they add up to a hundred percent. And that's why we need to, also, because we've converted everything to percentages, we still will want to include a row that summarizes the total number of observations for which these percentages were calculated. This gives the reader, the audience, some sense of the size of the sample that we're dealing with. Now the decision about how to percentage. Whether we do rows or columns will typically follow a principle in which we want to. If we think of one of our variables being a y variable, an outcome variable,, we will want to percentage the values of that variable within values of our X variable. That is the variable that we think of as the influential or right-hand side variable. So that we can get some sense of how Y varies according to different values of X. Here we are examining, in some sense, the effect of year, 1950, 2010, on the educational distribution of the United States. So in some sense year is our X variable, Y is education. And that's why we computed the percentages within years so we get the percentages of observations with different values of Y within each value of the X variable. Now a typical thing that we need to do is to include total percentages. That provides a visual cue as to whether percentages have been computed according to rows or columns. So here even though at some sense it's superfluous, we added this total row for the 100%. These show that if you sum the percentages above, they add to 100%. In other words, we've computed our percentages Within columns. We could have done this if we had done the opposite, and provided the hundred percents in a column, with one hundred per row, that would be a signal that the percentages had been computed within rows. That, in other words, the rows were the y variable, and the columns were the x variable. Now, how to choose row and column percentages? I'm going to give an example where we flip the previous one, and we think about the row variables as the x variable and the column variables as the y variable. Here we're looking at income as a function of education in 2010. I'll note that we've taken a continuous variable, income was recorded in dollars in the original census data and we've converted it into a continuous variable and we compute row percentages. So we compute the percentages of people with different categories of income according to their level of education. And then we include the total percentages that we just talked about. Here we have a 100% in every single row at the end of the row. This is a visual cue to the audience that reminds them that the percentages in each row when we sum across the columns we'll come up to 100%. And then we have a totals row in this case with the total percentages which allow for the comparison of the contents of each row, the each level of education with the total population. So if we look at this table we get a fairly clear sense in terms of the differences in the average levels of income across different levels of education. So we see that 29.3% of people with a college education, earning 50 to $99,000 a year. Where as if we look at people with less than a high school education, only 2.7% of them are earning something in that range. So these are very simple steps, some reminders about how to lay out tables. Obviously, we can spend a lot more time talking about refinements and how to present other kinds of information in tabular data. But I hope that at least this will give you some clues, as you move forward, about some basic principles to think about. As you construct tables and interpret them and present them.