The Tao of Statistics. Dana K. Keller

Чтение книги онлайн.

Читать онлайн книгу The Tao of Statistics - Dana K. Keller страница 6

Автор:
Серия:
Издательство:
The Tao of Statistics - Dana K. Keller

Скачать книгу

is a measurement convenience.

      Many statistics require interval levels of measurement (or could use ratio, discussed next) to yield valid results. Topics from grading differences in sections of the same course to the predicted flu infection rates for next year generally require this level of data. At a minimum, some reflection is appropriate when determining which statistics will be used with data.

      For the high school principal, most student achievement measurements are used as though they were at an interval level of measurement, as discussed earlier. The very fact that the debate continues, more than a century since its inception, is testimony to the resiliency (what statisticians call “robustness”) of the mean to minor violations of its required level of measurement.

      Many examples of the director of public health’s data are dichotomous (i.e., only two possible responses). For example, immunizations are coded for people in one of two ways, either yes (1) or no (0). These types of data generally can be used in statistical techniques that assume interval levels of measurement.

      Examples of true interval data are somewhat rare. The most common are Fahrenheit and Celsius scales to measure temperature. In the end, the interval level of measurement is important to the proper selection, use, and interpretation of statistical methods, but it has few true examples in daily practice.

      5.D. Ratio

       The rare ruler

       The flexible measure

       Precious property

      A ratio level of measurement scale has a true zero and is the trophy of data types. Weight and height are examples. We can say that half of 100 pounds is 50 pounds, and twice 6 feet is 12 feet. In other words, we can form interpretable ratios. These types of data are almost carefree in their use with regard to their level of measurement (assumptions on their distributions is another story, which will be told shortly).

      The high school maintains basic health information in the nurse’s office on such things as height, weight, and inoculations, but the high school principal would likely need a good reason to be granted access to many of these data. The principal does have, however, somewhat unlimited access to absentee and tardiness information. These variables are at a ratio level of measurement. Depending on how the data are coded and used, though, they could be at any of the levels of measurement. Recoding (assigning new codes after the fact) can further complicate understanding the data’s true level of measurement. Yet the principal has no hesitation in asking his office staff to do the tedious aspects of his research for him, as they are responsible for the same types of security and attention to detail as he is.

      The director of public health has electronic medical information for all Medicaid recipients in her state, although restrictions on the data’s use are quite stringent. Nonetheless, for a variety of purposes, she might have access to any of the types of information, including measures that have true zeros. For example, when looking at the degree of compliance with national adult immunization guidelines, she knows the number of different immunizations appropriate to adults and the number of immunizations delivered. Knowing who received which immunizations and counting them for each adult, she can form ratio scales for immunization counts and for rates of immunization compliance. The more her data are measured on interval and ratio scales, the larger the variety of statistical techniques that will be available.

      6. Simplifying—Groups and Clusters

       You and I are much alike

       Over there, they are different

       I will be with those like me

       For now

      Group differences are the cornerstone for much of the social research done today. The reason is that grouping is a convenient, logical, and valid way to reduce the complexity of data. Groups are created from ideas that people have about characteristics or conditions that separate interesting parts of the data. Gender and race/ethnicity are traditional examples and are used more often than they are reflectively reviewed for their relevance to answering the questions posed.

      Clusters are groups that are created by sophisticated mathematics when researchers do not know where, or how, to separate their groups. To form clusters, statistical software needs variables, which someone must choose and code. To that extent, researchers need a fair idea of the characteristics that would distinguish between groups, more than a random guess. The more researchers know how groups were formed in the data, the better they can utilize group information to address interesting questions of the data. If I wanted to know the characteristics (i.e., through clustering) to predict whether a diabetic will have a biennial eye examination, I would start with as many demographic and health care access variables as I could find. I would not look for hat size or color preference information.

      Both groups and clusters are used to understand inequalities in life. Also important, they can highlight similarities. Frequently, created clusters are used as though they were naturally occurring groups. With empirical evidence forming the foundation for a given cluster, is prior knowledge and recognition of the commonalities among members of that cluster required to consider members a group? Regardless of the answer to that question, retaining a semantic distinction between groups and clusters increases the specificity of results and adds context to the discussion.

      The distinction between using groups versus clusters is whether a strong hypothesis on the source of group differences exists prior to the start of the research. Researchers have found that, lacking a strong hypothesis, letting the data speak for themselves (i.e., clustering) can provide interesting insights. Such insights, though, still need a substantive context.

      Analysis of voting patterns by age bracket is a familiar use of groups. News reporters use groups in this way to show differences in voting preferences for young, middle-aged, and older adults. Clusters, on the other hand, might be formed to find common characteristics among members of a particular voting bloc, say those who voted for an independent candidate. Often, one finds that traditional groupings are used (e.g., gender) when such groupings are not related to the question of interest, a situation that should make one pause.

      The high school principal has all manner of information on groups: year in school, sports team membership, demographics, courses, and separate sections for the same courses, to name a few. Differences in students’ grades across these groups might suggest some important inequalities in his school. If vastly different grades, for example, were being awarded for similar work by similar students, an intervention with the teachers might be warranted. The statistics would not suggest which teacher graded too high or which too low. Those are value judgments. As we have seen, statistics do not make value judgments. They are remarkably evenhanded, even though statisticians might not always be due to unconscious biases (more on this later).

      The director of public health also has access to data that easily can be formed into groups. Nonetheless, she might want to create clusters to address questions about immunization patterns. For example, from epidemiology, she knows that certain religious groups and communities are reluctant to fully immunize their children, creating a situation where childhood diseases with small or vanishing incidence in the overall population

Скачать книгу