The Tao of Statistics. Dana K. Keller

Чтение книги онлайн.

Читать онлайн книгу The Tao of Statistics - Dana K. Keller страница 4

Автор:
Серия:
Издательство:
The Tao of Statistics - Dana K. Keller

Скачать книгу

approach to gathering the data and the statistics used to derive meaning. A large part of the misuse of statistics is a nonreflective, uncritical crunching of numbers (i.e., data) to generate other, somewhat context-free, numbers. These uncritically examined results are then granted trusted status based on unfounded validity (discussed later in some detail). The result could be a poor decision or an ineffective policy, yet the statistics eventually are blamed. To become useful information, good data need to be placed in relevant contexts, with clear understandings of the strengths and weaknesses of the statistics and results.

      This relevant context is the frame of reference from which relative meaning is derived. To know whether something is big or small, there needs to be a question of compared with what? A blue whale is small compared with the planet. An ant is huge compared with atoms. This same issue of needing a frame of reference, a comparison point, is important to most types of knowledge that might be acquired through statistics. Several types of frames of reference exist in statistics, as we will see.

      One brief side note on the word data: Data is a plural word. Until very recently, the only proper grammatical use was as a plural noun, such as geese. Correctly, then, data are transformed through a context into information. A single piece of data is called a datum. With all that said, a recent English dictionary has recognized the common use of data as a singular noun and grants that use as a secondary preference.

      Modern databases can contain dozens of gigabytes of information—an amount that is truly staggering to consider. High-speed office computers can need hours just to run through the data once. Census data are now available across the Internet. From course catalogs to recent golf scores to real-time stock prices, data surround us as oceans surround fish. Data are everywhere and generally too common even to notice.

      Here is where the tao of statistics starts to take shape. Curiosity gives birth to questions that create the need for data that come from measures that people design to create meaning. We open our eyes with questions and perceive contextually rich data as probabilistic answers. Depending on how we ask our questions, how we look for and process the data, and how we place results in a meaningful context, the tone and the texture of the results will differ. Even with the most evenhanded intentions, unconscious biases can creep into even the best of research designs and processes. We will touch on this point several times in later chapters.

      The high school principal has student records in an electronic form, meaning that his data collection will be inexpensive. Having electronic student records also means that the principal has access to a wide variety of data for his students. Throughout the years, the school system has collected demographic data on its students. The principal also has the funds for conducting a survey on his essentially captive audience. Although he is not new as a principal, the extent of the electronic data available has him a bit intimidated. When he used to have to get the data from students’ physical files in the office, his “research” questions were quite modest and constrained. Now that he can get hundreds of times the amount of information with only a few mouse clicks, he is somewhat more reflective, less impulsive, less likely to “just run the data” than he had thought that he would be.

      The director of public health has all of the state’s Medicaid information available to her electronically, which has greatly expanded since the 2014 provisions of the ACA were implemented. She also is authorized to conduct a single, limited survey if it can be seamlessly appended to one that is currently required by the state. She has less information on each person than does the high school principal, but the information she has is for a much larger number of people. When she accesses the data warehouse, she always pulls highly detailed data (i.e., disaggregated). She knows that she can always collapse (i.e., aggregate) it later, but not the reverse.

      Having data for large numbers of people and access to computers allows her to address important public health questions that would have gone unanswered not many years ago. Just as the principal has access to far more information than he used to have, the director of public health had that increase in access several years earlier. She is used to the amount and has started to understand the data’s strengths and limitations.

      Both the principal and the director of public health face the issue of data privacy for the individuals for whom they have data, although being in public health, the director is under far more scrutiny for confidentiality than the principal is due to ever more challenging provisions of the Health Insurance Portability and Accountability Act (HIPAA). Well-established protocols exist for the proper handling of these issues for the principal, but the director finds herself challenged with a need to update her protocols almost annually. Remember, data privacy has ethical and legal standing. Expected processes and procedures exist and are also regularly updated for research involving people and their data. Keep the importance of this issue in mind when using or when reporting human subjects’ data. Ignorance is not a valid excuse, and the penalties for knowingly, or even unknowingly, releasing personal health information or personally identifiable information can be severe.

      4. Data—Measurement

       Perceivable

       Describable

       Scores

      If you can perceive it, you can measure it. A measurement is an assigned value for a single characteristic. The way a characteristic is captured and, therefore, the way its data should be interpreted determine the measure being used to address the question at hand. Some measures are more accurate than others. Perfect measurement exists only in fantasy; we do the best we can.

      Good measurement not only is sufficiently accurate but also places its objects into mutually exclusive categories or scores (or “codes”). Some measures divide people into categories, such as gender. Other measures are more abstract continua, such as perception scales that ask the extent to which a respondent agrees with a statement. Regardless of the type of measurement, sufficient accuracy and mutual exclusivity are needed. The rest of measurement is an extension of those simple concepts.

      Data . . . the who, what, where, when, why, and how. Put the pieces together like a jigsaw puzzle, and voilà, you have meaningful information from what was a pile of otherwise useless facts. Be careful: In statistics, as in a jigsaw puzzle, cutting corners and force-fitting pieces can result in a very misleading picture. These shortcuts are sometimes difficult to notice and even more difficult to resolve.

      Along with grades, the principal’s school keeps information on standardized test scores, disciplinary actions, health records, extracurricular activities (e.g., clubs), and sporting achievements. Depending on state and federal laws, the principal will have varying levels of access to student records. To try to avoid having to accommodate some of the more sensitive aspects of HIPAA, he tries to avoid using both health and personal information whenever possible in his work.

      The director of public health has access to all of the public health and some other state databases. Again, her access is legally limited because the data are about health issues, such as immunizations and outbreaks of certain reported diseases. State and federal laws are quite strict on the access to and use of these types of data.

      5. Data Structure—Levels of Measurement

       What can be built?

       Ask

Скачать книгу