Medical Statistics. David Machin
Чтение книги онлайн.
Читать онлайн книгу Medical Statistics - David Machin страница 22
![Medical Statistics - David Machin Medical Statistics - David Machin](/cover_pre843611.jpg)
For example, the Association for Clinical Biochemistry and Laboratory Medicine gives a number of reference ranges in biochemistry such as for serum potassium of 3.5–5.3 mmol l−1 (labtestsonline 2019, https://labtestsonline.org.uk/articles/laboratory‐test‐reference‐ranges). This means in a normal, health population we would expect 19 out of 20 people to have serum potassium levels within these limits. For the corn plaster example, we would expect the majority of corns will be sized between 3.8–1.96 × 1.8 to 3.8 + 1.96 × 1.8 or 0.2 and 7.4 mm. Table 2.7 shows that there are 10 patients out of 200 (or 5%) who have a corn size above 7.4 mm and none below 1 mm; thus 95% of the observations in the data lie with two standard deviations of the mean.
Table 2.7 Frequency distribution the size of the corn, in mm, at baseline for 200 patients with corns who were recruited to a randomised control trial of the effectiveness of salicylic acid plasters compared with ‘usual’ scalpel debridement for the treatment of corns
(Source: data from Farndon et al. 2013).
Size of corn at baseline (mm) | Frequency | Percentage | Cumulative percentage |
---|---|---|---|
1 to <2 | 6 | 3.0 | 3.0 |
2 to <3 | 39 | 19.5 | 22.5 |
3 to <4 | 52 | 26.0 | 48.5 |
4 to <5 | 42 | 21.0 | 69.5 |
5 to <6 | 38 | 19.0 | 88.5 |
6 to <7 | 10 | 5.0 | 93.5 |
7 to <8 | 3 | 1.5 | 95.0 |
8 to <9 | 5 | 2.5 | 97.5 |
9 to <10 | 1 | 0.5 | 98.0 |
10 to <11 | 4 | 2.0 | 100 |
Total | 200 | 100 |
As we have noted, standard deviation is often abbreviated to SD in the medical literature. Sometimes for emphasis we will denote it by SD(x), where the bracketed term x is included for a reason to be introduced later.
Means or Medians?
Means and medians convey different impressions of the location of data, and one cannot give a prescription as to which is preferable; often both give useful information. If the distribution is symmetric, then in general the mean is the better summary statistic, and if it is skewed then the median is less influenced by the tails. If the data are skewed, then the median will reflect a ‘typical’ individual better. For example, if in a country median income is £20 000 and mean income is £24 000, most people will relate better to the former number.
It is sometimes stated, incorrectly, that the mean cannot be used with binary, or ordered categorical data but, as we have noted before, if binary data are scored 0/1 then the mean is simply the proportion of 1s. If the data are ordered categorical, then again the data can be scored, say 1, 2, 3, etc. and a mean calculated. This can often give more useful information than a median for such data, but should be used with care, because of the implicit assumption that the change from score 1 to 2, say, has the same meaning (value) as the change from score 2 to 3, and so on.
2.5 Displaying Continuous Data
A picture is worth a thousand words, or numbers, and there is no better way of getting a ‘feel’ for the data than to display them in a figure or graph. The general principle should be to convey as much information as possible in the figure, with the constraint that the reader is not overwhelmed by too much detail.
Dot Plots
The simplest method of conveying as much information as possible is to show all of the data and this can be conveniently carried out using a dot plot. It is also useful for showing the distributions in two or more groups side by side.
Example – Dot Plot – Baseline Corn Size
The data on corn size and treatment group (corn plaster or scalpel) are shown in Figure 2.5 as a dot plot. This method of presentation retains the individual subject values and clearly demonstrates any similarities or differences between the groups in a readily appreciated manner. An additional advantage is that any outliers will be detected by such a plot. However, such presentation is not usually practical with large numbers of subjects in each group because the dots will obscure the details of the distribution. Figure 2.5 shows that the two randomised groups had similar distributions of corn sizes at baseline.
Figure 2.5 Dot plot showing corn size (in mm) by randomised treatment group for 200 patients with corns.
(Source: data from Farndon et al. 2013).
Histograms
The patterns may be revealed in large data set of a numerically continuous variable by forming a histogram with them. This is constructed by first dividing up the range of variable into several non‐overlapping and equal intervals, classes, or bins, then counting the number of observations in each. A histogram for all the baseline corn sizes in the Farndon et al. (2013) trial data is shown in Figure 2.6. In this histogram the intervals corresponded to a width of 1 mm. The area of each histogram block is proportional to the number of subjects in the particular corn size category concentration group. Thus, the total area in the histogram blocks represents the total number of patients. Relative frequency histograms allow