Business Experiments with R. B. D. McCullough

Чтение книги онлайн.

Читать онлайн книгу Business Experiments with R - B. D. McCullough страница 20

Business Experiments with R - B. D. McCullough

Скачать книгу

whether there is enough data to draw a conclusion. Testing for significance is one of the tools we use in analyzing A/B tests, and Chapter 2 will show you how to do it. As we will explain in the next few sections, we need more than just the lift numbers to perform the significance test.

      Most website testing managers will tell you that more than half of the website tests that they run are not significant, meaning that they cannot conclude that one version is better than the other. For example, in the video icon test in Figure 1.7, there were no significant differences in the % of users who viewed the product detail pages or the average sales per session. If we looked at the raw data, there were probably some small differences, but that difference was not great enough to rise to the level of significance. The analyst has wisely chosen not to report the lift numbers, and instead simply said, “there was no significant difference.” While the manager who came up with this video icon idea might not be too happy to find that it doesn't work, it is important to know that it doesn't work so that attention can be shifted to more promising improvements to the website. Smart testing managers realize that it is important to run many tests to find the features of the website that really do change user behavior.

      Exercises

      1 1.5.1 Visit a retail website and identify five opportunities for A/B tests on the website. For each test, clearly define the A and B treatments that you would test and identify a response variable to measure performance.

      2 1.5.2 Find an article that reports the results of a medical experiment, a business experiment, or a psychological experiment. How are the results reported? Do they use a graph to display the data? Does the article indicate whether the difference between treatments was significant?

      3 1.5.3s. Visit a retail store and identify five opportunities for A/B tests. For each test, clearly define the A and B treatments that you would test and identify a response variable to measure performance.

      Experiments are as old as the bible. From The Book of Daniel (1, 11‐16),

      Daniel then said to the guard whom the chief official had appointed over Daniel, Hananiah, Mishael, and Azariah, ”Please test your servants for ten days: Give us nothing but vegetables to eat and water to drink. Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see.” So he agreed to this and tested them for ten days. At the end of the ten days they looked healthier and better nourished than any of the young men who ate the royal food. So the guard took away their choice food and the wine they were to drink and gave them vegetables instead.

      The first clinical trial was conducted in 1747 by the Scottish physician James Lind, who was trying to find a cure for scurvy. Scurvy was a serious problem, since it killed more British sailors than the French and the Spanish combined. After two months at sea, when the men were afflicted with scurvy, Lind divided 12 sick sailors into six groups of 2. Each day the groups were administered cider, 25 drops of sulfuric acid, vinegar, a cup of seawater, and barley water, and the final group received two oranges and one lemon. After six days the fruit ran out, but one sailor was completely recovered and the other was almost recovered.

      Randomization was introduced into experimental design in the nineteenth century by Peirce and Jastrow (1885) (many people incorrectly attribute this to R. A. Fisher in the twentieth century, but they are wrong). Designed experiments that were not randomized, but instead were balanced, were developed by William Gosset, pseudonymously writing as “Student” (who also invented Student's images‐distribution) in the early twentieth century. Fisher later popularized randomized experiments in the 1920s. Much of the early work in the design of experiments in the early twentieth century focused on agriculture, and it was extremely fruitful. Moses and Mosteller (1997, p. 217) noted,

      The development of greatly increased agricultural productivity in the twentieth century has rested largely on field experiments in which new varieties of crops (and new agricultural practices) are compared to standard ones. So important is this empirical testing to agricultural progress that a large part of modern statistical design of experiments actually grew up in the context of agricultural experimentation.

Grid chart depicting the increase in US agricultural production (output and input) in the twentieth century, from the year 1910 to 2000.

      The next field to be revolutionized by the design of experiments was manufacturing. Chemical production greatly increased as a result of the design of experiments and this spread to other process industries. Variability in successive batches of output was decreased, allowing for a more uniform, higher quality product and a concomitant decrease in waste. In the 1950s, the statistical pioneer W. Edwards Deming taught statistical methods to Japanese manufacturers at a time when “made in Japan” was synonymous with “low quality.” Deming taught them to greatly increase quality and output, especially in the automotive and electronics industries.

Grid chart depicting the remarkable growth of American imports of Japanese cars from the 1960s through the 1980s.

      1 1.7.1 Define the following terms:independent variable /

Скачать книгу