The Tao of Statistics. Dana K. Keller

Чтение книги онлайн.

Читать онлайн книгу The Tao of Statistics - Dana K. Keller страница 3

Автор:
Серия:
Издательство:
The Tao of Statistics - Dana K. Keller

Скачать книгу

are used with appropriately selected statistical techniques in contexts that address important questions in interpretable ways. In fact, we will see that the data and statistical issues facing both of our subjects are far more similar than they are different, although their contents greatly differ. The process of doing statistics in research projects changes very little across a wide variety of questions and fields of interest.

      The world of statistics starts with a question, not with data. The quality of the question in combination with the quality of the data greatly determines the quality of the results. No surprise there, but quality research questions can be surprisingly rare in otherwise well-conducted studies, which is a situation that often leads to ambiguity in the results.

      2. Ambiguity—Statistics

       Not enough to know

       Just enough for a better guess

       Statistics are born

      Statistics help tame ambiguity by quantifying it (a point we will revisit a few times). In the world of statistics, if there is no ambiguity and no need to guess, we use population parameters. Where there is ambiguity, we use sample statistics. These terms have been shortened over the years to parameters and statistics. Statistics, then, are ways to make educated guesses. They might do so with a remarkable flair for Greek letters and long equations, yet they are guesses nonetheless.

      Samples have sampling error, again by definition. All statistics start from samples and have various amounts and kinds of sampling error. Samples can even be samples in time but must be samples of something bigger if you expect to produce a statistic. Because it is generally too expensive to measure everyone or everything representing a situation of interest, statistics are used throughout academic, professional, and everyday life.

      When they are properly phrased, statistical results are hard to disprove because they do not contain absolutist language. This built-in ambiguity can be frustrating to those who want a strict yes-or-no answer, especially after they have waited until the data are collected and analyzed, sometimes at great cost. Unfortunately for these individuals, statistics are not meant to suit the convenience of the moment. Only so much can be properly inferred from a system of educated guesses, regardless of how carefully the guesses are made.

      It would not be much of an exaggeration to say that the world is run by statistics, or at least by people who get statistical information upon which to base their decisions. With the communications industry almost omnipresent in day-to-day life, people’s interest in and need to understand statistics has unnoticeably mushroomed into a dominant feature of modern life. Why? Because statistics are used to answer people’s questions, and the answers reported (through TV, radio, newspapers, etc.) use statistics as evidence. Remember, research needs samples, and samples generate statistics.

      In our day-to-day lives, we use statistics without even knowing it. Those of us who own and drive a car guess whether we can make it to the next gas station based on what we know of the road conditions, typical gas mileage (a statistical issue, to be sure), and the consequence of being wrong should we run out of gas while on the way to where we are planning to go. Nondrivers make equally statistical estimations and decisions, such as how much cash to keep on hand for a given weekend’s planned activities.

      In our first encounter with the high school principal, we see that he has data that are capable of addressing a wide variety of relevant academic and social questions. His questions are mostly about current academic achievement, but some are more future-oriented. He wants to use his data to support budgeting and expansion activities as well as to identify current problems for more immediate attention. He wants to become known for using science and statistics in his professional life due to his ambition to become a district superintendent someday.

      The principal considers all of his data to be samples, even when a casual observer might suggest otherwise. He wants to generalize to other classes and years. His 20 years at the school have shown him that changes in the makeup of each student class occur very slowly and subtly from year to year. For him, statistics are safer than having to take a harder stance. He likes to fall back on statistics being a science of quantifying ambiguity, and that means that his answers will be somewhat ambiguous, too.

      The director of public health will have access to very large and mostly representative samples to address her questions. Although her questions will be answered through her statewide electronic database, people are missing, for a variety of reasons, throughout the data. Yet her data are more representative than most and are likely to be consistent from year to year, even given any unknown sources of bias. Given the importance of year-to-year comparisons and the need to be sensitive to people moving in and out of the medical assistance program over time, she is quite pleased with the representativeness, completeness (the relative lack of missing data), and comprehensiveness (the availability of measures to address important characteristics, for her questions) of her data. For her, ambiguity is part of what gives her the luxury of testing different hypotheses for the potential impact of changes in public health policy. It has taken her a long time to gain professional credibility in a previously male-dominated field, and her reliance on carefully considered statistical methodologies are a central part of her strategy for maintaining and extending her leadership in the field. The changes in medical insurance and health care delivery since the introduction of the Affordable Care Act (ACA) have created a situation where she needs to adjust her policies and programs to accommodate the added influx of Medicaid beneficiaries within her databases. This changing landscape of people and policies adds a layer of ambiguity to her job that could not have been foreseen when she accepted her position with the State.

      Yet ambiguity holds one of the keys to the path of statistical knowledge. The ambiguity in the system means that no known solution fits perfectly. Ah, the challenge! Think of it! What is the best solution? Is the best solution the one with the smallest errors, or is it the most parsimonious one? What data are available? How good are they? Judgments fly. Decisions are made. The challenge in defining a statistical solution often is not to be the most correct but, instead, the least wrong! How? By means of a sharp question that cuts straight through ambiguity associated both with the statistical and methodological approaches and with the data themselves.

      The overarching message about statistics is that they are uncertain. Treated that way, statistics become more of an intellectual challenge, with less attachment and much less certainty. So, continue to relax and marvel at some close-up views of the foundation of statistics. Look at where the cracks are. Realize what those cracks could mean for edifices built on top. Smile, as together we experience more about this view of the world.

      3. Fodder—Data

       Observe

       Record

       More

      Data are what we hear, see, smell, taste, touch, and more. Data can even be what we sense. Data can represent anything and everything that we can discriminate well enough to distinguish from something else. In short, if it can be perceived, it can be coded and used as data.

      Data are the fodder of measurement, the backbone of statistics. Through a context, data become transformed into information. That context is a fusion of substantive

Скачать книгу