.
Чтение книги онлайн.
Читать онлайн книгу - страница 8
The problem with both types of control groups is that there could be systematic differences between the patients in the different groups. Isaac Massey, a contemporary of Jurin, made this criticism of the work on smallpox, pointing out that those who could afford variolation may have been in better health than those in the comparison groups [22]. He concluded that what was needed was groups that were similar, they ‘must and ought to be as near as may be on a Par’ [22].
COMPARING SIMILAR GROUPS
When groups are similar at baseline, it is more likely that any differences in subsequent outcomes will be due to the differences in the effects of the treatments. The idea of comparing like with like was proposed in the fourteenth century by the poet Francisco Petrarch, who suggested using similar groups of patients to compare the then current treatments with simply letting nature take its course [26].
One way to create similar groups is to recruit a number of patients who are all alike, then give them different treatments. The testing of potential treatments for scurvy is a widely cited example of the benefit of using similar groups. Scurvy is a debilitating and sometimes fatal disease, which afflicted sailors on long‐distance sea voyages from the fifteenth to the nineteenth centuries [27, 28]. By the late 1500s, the benefits of consuming oranges and lemons were well known by Dutch sailors [27], but many English expeditions continued to suffer serious loss of life through scurvy [28]. The issue was still unresolved in 1747 when James Lind, a Royal Navy surgeon, carried out a classic study to assess the effects of six common treatments. He identified 12 sailors with scurvy who were ‘as similar as I could have them’, and tested each of the treatments on groups of 2 men (each pair to receive either: oil of vitriol, vinegar, sea water, cider, oranges and lemons, or a herbal paste) [29]. After 14 days Lind observed ‘the most sudden and visible good effects were perceived from the use of oranges and lemons’. These findings were not widely accepted, and even Lind had doubts about them [29, 30], but the method used reflects an advance in thinking about ways to test treatments. Lind is rightly celebrated for his comparison of like with like in the evaluation of treatments. (In his ‘Treatise of the Scurvy’ Lind does not make any clear recommendations for the treatment of the disease, possibly because he believed that scurvy was not due to poor diet, but was a result of faulty digestion exacerbated by wet weather [29, 30].)
Another study in the eighteenth century used similar groups to assess whether the adverse effects of variolation (to prevent smallpox) could be ameliorated by pretreatment with a compound of mercury. At that time about 1 out of 50 patients vaccinated against smallpox died following the procedure [31]. In 1767 William Watson recruited 31 children who were similar in age, gender and diet [32]. These were divided into three groups, which received either the mercury mixture, a mild senna laxative or no treatment. No clear difference was found between the groups, using an objective measure of assessment (the number of pock marks caused by the variolation). Watson concluded that variolation against smallpox was effective with or without pretreatment with mercury or a mild laxative [32].
CASTING LOTS AND TREATMENT ALLOCATION
Comparing similar groups of patients was an important step forward in the evaluation of treatments, but it leaves open the possibility that the groups may have differed on important factors that were not measured. Further, a subconscious bias in the doctor allocating patients to treatments could influence the way individuals were assigned to groups (e.g. the slightly sicker ones might be preferentially assigned to one group). An alternative approach, which prevents this bias, is to allocate individuals to treatments in a truly random way, so that the final groups will be balanced on all factors, whether measured or not.
The idea that some form of randomisation should be used to allocate patients to treatment groups was proposed in the 1640s. Joan Baptista van Helmont, a Flemish chemist, alchemist and physician, recommended this method to evaluate the effectiveness of bloodletting [33]. He suggested dividing up to 500 patients into 2 groups, then casting lots (equivalent to tossing a coin) to decide which group would be given the conventional therapy (bloodletting) and which would receive van Helmont's own treatment. A notable feature of the trial design is that the outcome would be decided by the number of funerals that occurred in the two groups. The experiment was not carried out. (The proposed use of an objective outcome measure such as this is unusual for its time.)
One method of randomised allocation was used in 1848 by Thomas Graham Balfour to investigate whether homeopathic belladonna could prevent scarlet fever. Balfour identified 151 boys who had not had the disease, and ‘divided them into two sections, taking them alternately from the list, to prevent the imputation of selection’ [34]. Balfour recognised that if he had to decide which boys were allocated to each group, his choices might be biased. (Alternate selection from a list is essentially a method of randomisation, as the factors which are related to dying from scarlet fever, will be randomly scattered throughout the list.) The study showed that exactly two children in each group developed scarlet fever, leading him to conclude that ‘the numbers are too small to justify deductions as to the prophylactic power of belladonna’ [34], a commendably careful interpretation of the findings.
Instead of alternate selection from a list, patients could be allocated to treatments by the date of their admission to hospital. This method was used by the Danish physician Johannes Fibiger in 1896–1897 [35] to evaluate the effectiveness of a serum treatment for diphtheria. Thus, patients admitted to hospital on one day received serum and those on the next day were untreated. The outcome was persuasive: only 8 of 239 patients in the serum group died, compared to 30 of the 245 controls.
The use of alternate allocation began to gain popularity in the first few decades of the twentieth century because it prevented bias in the assignment of patients to treatments. These research studies were conducted in both the United States and the UK, with patients being randomised by the order of their attendance at a healthcare facility [36–39]. These trials signalled the growing recognition of the importance of achieving comparable groups.
RANDOM NUMBERS FOR TREATMENT ALLOCATION
A landmark series of three trials, conducted under the auspices of the UK Medical Research Council, used random numbers to allocate patients to treatments. This methodological advance was proposed by the medical statistician Professor (later Sir) Austin Bradford Hill [40]. It was first used in a large field trial that assessed the effectiveness of a vaccine for whooping cough [41]. Although this study began in 1944, it was not published until 1951. The second trial, of streptomycin for pulmonary tuberculosis, became the most highly acclaimed study in the history of treatment evaluation. It began in 1946, but was the first to be published, in 1948 [42]. The third trial involved a large‐scale field trial of an antihistaminic drug (thonzylamine) for the prevention of the common cold [43].
As well as being published first, the streptomycin trial provided a major advance in the treatment of a feared disease, tuberculosis: it reduced the fatality rate at six months from 27% to 7% and also reduced the severity of disease among survivors. An editorial that accompanied the paper identified the advantage of individual randomisation over alternate allocation: it prevented a patient being included or rejected, based on whether the next treatment was to be antibiotic or control [44]. For example, if the doctor thought that the drug would not be effective in seriously ill patients, they might not be included in the study if they were scheduled to receive the active treatment. This would only need to happen a few times to bias the results of the study.
In addition to the use of random numbers to allocate patients to treatments, these three trials stand out for two