The Tao of Statistics. Dana K. Keller

Чтение книги онлайн.

Читать онлайн книгу The Tao of Statistics - Dana K. Keller страница 5

Автор:
Серия:
Издательство:
The Tao of Statistics - Dana K. Keller

Скачать книгу

ground

       Turn over rocks

       Dig in the dirt

      The grounding for statistics is the level of measurement of the data. Some statistics are appropriate for some levels of measurement; others are not. This is an area where one needs to understand the deeper structure of the data to know which statistics would be meaningful. For example, the data’s level of measurement limits the choice of the most often-used statistic—the average, what statisticians call the central tendency. There are three common choices of averages: the mean, median, and mode (with somewhat esoteric versions within each). These different types of averages are not equally appropriate for data at different levels of measurement. Specific levels of measurement and the impact of each on the choice of statistics will be discussed soon.

      The topic sounds complicated, but is not. Once you understand how data differ according to their level of measurement, you will quickly grasp which statistics are appropriate for a given set of conditions. Fortunately, many statistical techniques have options that can account for the various levels of measurement.

      Through the questions being asked by the high school principal and the director of public health, we will encounter four of levels of measurement (i.e., nominal, ordinal, interval, and ratio, to be explained next) in their various data sets or from potential survey responses. They, and we, will accommodate these levels of measurement as we progress through this book.

      Important to getting the statistics correct is the recognition and accommodation of each variable’s level of measurement. Even researchers with decades of experience occasionally will be embarrassed by having used a statistic in a way that was inconsistent with the level of measurement requirements of that statistic. Though an issue with level of measurement rarely is a knockout punch, these issues tend to be varyingly important limitations on the confidence that researchers (should) have in their results.

      5.A. Nominal

       Nominal says different

       No more does it claim

       Others shouldn’t either

      The nominal level of measurement is about categories. Some statisticians refer to it as the categorical level of measurement. The categories have characteristics that differ but are not quantified as to the amount of the difference. For example, political party, religious affiliation, gender, and so forth can be recorded, grouped, and counted. Yet we do not say, for example, that one religion is more of a religion than another.

      Under certain conditions, the most typical type of average, the mean (i.e., arithmetic average), is appropriate for nominal data. That is, the variable has only two possible responses, and talking about the percentage that corresponds to one of those responses makes sense. With gender coded 0 for female and 1 for male, it would make sense to use the mean to say that a group is 60% male.

      Variables coded and interpreted as we have just seen find use in a variety of statistical techniques requiring at least interval levels of measurement (a discussion about interval level data is coming shortly). Some nominal data, therefore, can be quite useful in answering a surprisingly broad range of questions.

      Both the high school principal and the director of public health have nominal data. The high school principal has data on gender, school club membership, sports participation, and scholastic topics for each student. Some aspects of these measures could be coded as nominal, such as variables for the names of extracurricular activities (e.g., yearbook).

      The director of public health has access to a host of demographic data that are nominal, such as ethnicity and zip code. Generally, nominal data are summarized in tables or cross-tabulations of two characteristics, such as sports participation by gender or immunization rates by age or age grouping. Nominal data also delineate many of the groups of interest to research. Nonetheless, even in her naming of variables, she needs to be careful of racial and ethnic sensitivities (e.g., should a category be Black, Black American, African American, or combined with other categories into Americans of Color?) or some stakeholders could be offended and not pay attention to the point of her report.

      5.B. Ordinal

       With distances unsure

       Blindly even steps

       Arrive at cracks

      Ordinal measurement is common for opinion polls. We can distinguish between levels of agreement but cannot be sure that the psychological distance between pairs of adjoining response choices are equivalent. For example, the psychological distance between “strongly disagree” and “moderately disagree” might not be the same as the distance between “neutral” and “moderately agree.” In these cases, an arithmetic average (the mean) might not yield an interpretable answer.

      The high school principal has ordinal scales from some student surveys that he has already conducted, and of which he might generate more. Although the case could be made that course grades really are ordinal, they have been and continue to be used as interval (the next topic) since their creation. The debate is whether the difference in knowledge of a topic between two students scoring, say, 20 and 60 points on a test is the same as that between students scoring 60 points and 100 points.

      Along with actual medical data, the director of public health has results for perception surveys on the services received by the state’s medical assistance recipients. She also has another survey to be implemented fairly soon, a state requirement of her department. Most of her medical data, however, are either nominal or ratio, at least in how they are handled.

      For statistics appropriate to ordinal data, both the high school principal and the director of public health will use frequency counts for the responses to each of their surveys’ items and a form of chi-square (described a bit later) for statistical significance tests. They both will use medians and modes (also discussed later) to describe these central tendencies. Recognizing ordinal data for what they are can save many later headaches. Statisticians using ordinal data with statistics requiring interval data sometimes pay a harsh price in terms of their reputation.

      5.C. Interval

       Interval is regular

       Same steps, no cracks

       Yet zero is not none

      Interval data have evenly spaced steps but no true zero. Course grades could be an example, where a zero score on a math test does not mean a complete lack of mathematics knowledge. A zero on a math test means that the student did not arrive at a single correct answer for the sample of possible relevant mathematics questions on the test, but the test has no way of capturing whether the student has no knowledge of the

Скачать книгу