Medical Statistics. David Machin

Чтение книги онлайн.

Читать онлайн книгу Medical Statistics - David Machin страница 24

Medical Statistics - David  Machin

Скачать книгу

Figure 2.1, measurements were made only once for each subject. Thus the variability, expressed, say, by the standard deviation, is the between‐subject variability. If, however, measurements are made repeatedly on one subject, we are assessing within‐subject variability.

      Illustrative Example – Within‐Subject Variability – Total Steps per Day

Graph depicts the plot of total steps per day for 100 days for one participant in a global corporate challenge designed to increase physical activity.

      If another subject had also completed this experiment, we could calculate their within‐subject variation as well, and perhaps compare the variabilities for the two subjects using these summary measures. Thus a second subject had a mean step count of 12 745 with standard deviation of 4861 steps, and so has a smaller mean but similar variability.

      Successive within‐subject values are unlikely to be independent, that is, consecutive values will be dependent on values preceding them. For example, if a sedentary or inactive person records their step count on one day, then if the step count is low on one day it is likely to be low on the next day. This does not imply that the step count will be low, only that it is a good bet that it will be. In contrast, examples can be found in which high step counts are usually followed by lower values and vice versa. With independent observations, the step count on one day gives no indication or clue as to the step count on the next.

      Suppose successive observations on a patient with heart disease taken over time fluctuate around some more or less constant daily step count, then the particular level may be influenced by factors within the patient. For example, step counts (and physical activity levels) may be affected by the presence of a viral infection whose presence is unrelated to the cause of the heart disease itself. Levels may also be influenced by the severity of the underlying condition and whether concomitant treatment is necessary for the patient. Levels could also be influenced by other factors, for example, alcohol, tobacco consumption and diet. The cause of some of the variation in step counts may be identified and its effect on the variability estimated. Other variation may have no obvious explanation and is usually termed random variation. This does not necessarily imply there is no cause of this component of the variation but rather that its cause has not been identified or is being ignored.

      Different patients with heart disease observed in the same way may have differing average levels of step counts (physical activity levels) from each other but with similar patterns of variation about these levels. The variation in mean step count levels from patient to patient is termed between‐subject variation.

      Observations on different subjects are usually regarded as independent. That is, the data values on one subject are not influenced by those obtained from another. This, however, may not always be the case, particularly with subjective measures such as pain or quality of life which may be influenced by the subject's personal judgement, and different patients may assist each other when recording their quality of life.

      Graphs

      In any graph there are clearly certain items that are important. For example, scales should be labelled clearly with appropriate dimensions added. The plotting symbols are also important; a graph is used to give an impression of pattern in the data, so bold and relatively large plotting symbols are desirable. This is particularly important if it is to be reduced for publication purposes or presented as a slide in a talk.

      A graph should never include too much clutter; for example, many overlapping groups each with a different symbol. In such a case it is usually preferable to give a series of graphs, albeit smaller, in several panels. The choice of scales for the axes will depend on the particular data set. If transformations of the axes are used, for example, plotting on a log scale, it is usually better to mark the axes using the original units as this will be more readily understood by the reader. Breaks in scales should be avoided. If breaks are unavoidable under no circumstances must points on either side of a break be joined. If both axes have the same units, then use the same scale for each. If this cannot be done easily, it is sensible to indicate the line of equality, perhaps faintly in the figure. False impressions of trend, or lack of it, in a time plot can sometimes be introduced by omitting the zero point of the vertical axis. This may falsely make a mild trend, for example a change from 101 to 105, into an apparently strong trend (seemingly as though from 1 to 5). There must always be a compromise between clarity of reproduction that is filling the space available with data points and clarity of message. Appropriate measures of variability should also be included. One such is to indicate the range of values covered by two standard deviations each side of a plotted mean.

      It is important to distinguish between a bar chart and a histogram. Bar charts display counts in mutually exclusive categories, and so the bars should have spaces between them. Histograms show the distribution of a continuous variable and so should not have spaces between the bars. It is not acceptable to use a bar‐chart to display a mean with standard error bars (see Chapter 6). These should be indicated with a data point surrounded with errors bars, or better still a 95% confidence interval.

      With currently available graphics software

Скачать книгу