Читать онлайн книгу - Medical Statistics. David Machin. Медицина. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Скачать книгу

much consideration as analysis, and a statistician can provide advice on design. In a clinical trial, for example, what is known as a double‐blind randomised design is nearly always preferable (see Chapter 15), but not always achievable. If the treatment is an intervention, such as a surgical procedure, it might be impossible to prevent individuals knowing which treatment they are receiving but it should be possible to shield their assessors from knowing. We also discuss methods of randomisation and other design issues in Chapter 15.

Laboratory Experiments

Medical investigators often appreciate the effect that biological variation has in patients, but overlook or underestimate its presence in the laboratory. In dose–response studies, for example, it is important to assign treatment at random, whether the experimental units are humans, animals or test tubes. A statistician can also advise on quality control of routine laboratory measurements and the measurement of within‐ and between‐observer variation.

Displaying Data

A well‐chosen figure or graph can summarise the results of a study very concisely. A statistician can help by advising on the best methods of displaying data. For example, when plotting histograms, choice of the group interval can affect the shape of the plotted distribution; with too wide an interval important features of the data will be obscured; too narrow an interval and random variation in the data may distract attention from the shape of the underlying distribution. Advice on displaying data is given in Chapter 2.

Choice of Summary Statistics and Statistical Analysis

The summary statistics used and the analysis undertaken must reflect the basic design of the study and the nature of the data. In some situations, for example, a median is a better measure of location than a mean. (These terms are defined in Chapter 2.) In a matched study, it is important to produce an estimate of the difference between matched pairs, and an estimate of the reliability of that difference. For example, in a study to examine blood pressure measured in a seated patient compared with that measured when he or she is lying down, it is insufficient simply to report statistics for seated and lying positions separately. The important statistic is the change in blood pressure as the patient changes position and it is the mean and variability of this difference that we are interested in. This is further discussed in Chapter 7. A statistician can advise on the choice of summary statistics, the type of analysis and the presentation of the results.

Medical Statistics and Data Science

Because of the availability of large amounts of data over the last few decades, the term data science has emerged to describe the substantial current intellectual effort around research with the goal of extracting information from these data. The type of data currently available in all sorts of application domains is often massive in size, very heterogeneous and far from being collected under designed or controlled experimental conditions. Nonetheless, it contains information, often substantial information, and it has been argued that data science is a new interdisciplinary approach that makes maximal use of this information. However, data alone is typically not that informative and (machine) learning from data needs conceptual frameworks. Data science would seem to encompass statistics. However, we would argue that statistics is crucial for providing conceptual frameworks that enhance the understanding of fundamental phenomena, highlight limitations and provide a formalism for properly founded data analysis, information extraction and quantification of uncertainty, as well as for the analysis and development of algorithms that carry out these key tasks.

As taught at a number of universities, data science differs from statistics in a number of ways. Statistics originated before the computer and its core concern is with statistical models. However, no serious statistician is beguiled into confusing their model with reality (‘All models are wrong, but some are useful’ to quote the famous statistician John Tukey). However, models are very useful in describing how the world might be, and for making generalisations beyond the data. Data science is empirical, reliant on large data sets, whereas one of the key successes of statistics is doing inference on relatively small data sets, such as those available in agriculture and laboratories. Data science is often used for prediction, and the idea is that with the vast amounts of data now available electronically (such as that provided by national health services) one can look at empirical relationships and build up accurate predictors, such as how drugs will behave in individuals. These predictions are often highly successful, but lacking models it can be difficult to know why it makes some predictions, and how generalizable the predictions might be. Data science is related to the concept of ‘big data’. However, simply because a sample is large does not mean it is unbiased.

A case in point is the reported link between taking hormone replacement therapy (HRT) and lower heart disease rates observed in some large data sets. However, a key issue is whether women who use HRT are already more health conscious. It can be difficult to know whether this fact is adequately accounted for in conclusions drawn from the big data. Thus, it was only when the results of the randomised controlled trial of the use of HRT (Writing Group for the Women's Health Initiative Investigators 2002) became available that HRT was shown not to protect against heart disease. In fact, the trial identified an increased risk for total cardiovascular disease with hazard ratio 1.22 and 95% confidence interval 1.09 to 1.36 (the technical terms will be explained in Chapter 11). In this example, big data led to a wrong conclusion.

2 Displaying and Summarising Data

1 2.1 Types of Data

2 2.2 Summarising Categorical Data

3 2.3 Displaying Categorical Data

4 2.4 Summarising Continuous Data

5 2.5 Displaying Continuous Data

6 2.6 Within-Subject Variability

7 2.7 Presentation

8 2.8 Points When Reading the Literature

9 2.9 Technical Details

10 2.10 Exercises

Summary

This chapter describes different types of data that the reader is likely to encounter. It illustrates methods of summarising and displaying categorical data (bar charts,