Medical Statistics. David Machin

Чтение книги онлайн.

Читать онлайн книгу Medical Statistics - David Machin страница 23

Medical Statistics - David  Machin

Скачать книгу

up of different numbers of observations which may be useful when studies are compared.

Schematic illustration of the histogram of baseline index corn size for 200 patients with corns.

      (Source: data from Farndon et al. 2013).

      The choice of the number and width of intervals or bins is important. Too few intervals and much important information may be smoothed out; too many intervals and the underlying shape will be obscured by a mass of confusing detail. As a rule of thumb, it is usual to choose between 5 and 15 intervals, but the correct choice will be based partly on a subjective impression of the resulting histogram. In the corn plaster trial the baseline corn size was measured in integers to the nearest mm. In Figure 2.6 we have 10 intervals or bins of width 1 mm which fits our rule of thumb. In this example an interval of 1–1.99 mm covers bin 1, 2–2.99 mm covers bin 2, etc. Histograms with bins of unequal interval length can be constructed but they are usually best avoided.

       Box and Whisker Plot

      A box and whisker plot contains five pieces of summary information about the data: the median; upper quartile; lower quartile; maximum and minimum values. If the number of points is large, a dot‐plot can be replaced by a box and whisker plot and which is more compact than the corresponding histogram.

       Illustrative Example – Box and Whisker Plot – Birthweight by Type of Delivery

Schematic illustration of the box and whisker plot of size of corn at baseline by randomised group for 200 patients with corns.

      (Source: data from Farndon et al. 2013).

       Scatter Plots

      When one wishes to illustrate a relationship between two continuous variables, a scatter plot of one against the other may be informative.

       Illustrative Example – Scatter Plot – Baseline Corn Size by Corn Size at a Three month Follow‐up

Schematic illustration of the scatter plot of baseline corn size by corn size at a three month follow-up for 181 patients with corns.

      (Source: data from Farndon et al. 2013).

      It is likely that baseline corn size will have an influence on corn size at three months, but vice versa cannot be the case. In this case, if one variable, x, (baseline corn size) could cause the other, y, (three‐month corn size) then it is usual to plot the x variable on the horizontal axis and the y variable on the vertical axis.

      In contrast, if we were interested in the relationship between baseline corn size and height of the patient then either variable could cause or influence the other. In this example it would be immaterial which variable (corn size or height) is plotted on which axis.

      Measures of Symmetry

Graphs depict examples of two skewed distributions.

      For the corn size data, the mean from the 200 patients is 3.8 mm and the median is 4 mm so we conclude the data are reasonably symmetric. One is more likely to see skewness when the variables are constrained at one end or the other. For example, waiting time or time in hospital cannot be negative, but can be very large for some patients but relatively short for the majority and so it likely to be right or positively skewed.

      A common skewed distribution is annual income, where a few high earners pull up the mean, but not the median. In the UK about 68% of the population earn less than the average wage, that is, the mean value of annual pay is equivalent to the 68th percentile on the income distribution. Thus, many people who earn more than the earnings of 50% (the median) of the population will still feel under paid!

Скачать книгу