Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.
Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 14
Similarly, consider a large retail company that has a “customer loyalty” program, offering discounts to its regular customers who present their bar-coded key tags at the check-out counter. Suppose the firm wants to nudge customers to return to their stores more frequently and generates discount coupons that can be redeemed if the customer visits the store again within so many days. The marketing analysts in the company could design an experiment in which they vary the size of the discount and the expiration date of the offer, issue the different coupons to randomly chosen customers, and then see when customers return.
Experimental Data—An Example1
In an experimental design, the causal variables are called factors, and the outcome variable is called the response variable. As an illustration of a data table containing experimental data, open the data table called Concrete. Professor I-Cheng Yeh of Chung-Hua University in Taiwan measured the compressive strength of concrete prepared with varying formulations of seven different component materials. Compressive strength is the amount of force per unit of area, measured here in megapascals that the concrete can withstand before failing. Think of the concrete foundation walls of a building; they need to be able to support the mass of the building without collapsing. The purpose of the experiment was to develop an optimal mixture to maximize compressive strength.
1. Select File ► Open. Choose Concrete and click OK.
Figure 2.4: The Concrete Data Table (Within Project)
The first seven columns in the data table in Figure 2.4 represent factor variables. Professor Yeh selected specific quantities of the seven component materials and then tested the compressive strength as the concrete aged. The eighth column, Age, shows the number of days elapsed since the concrete was formulated, and the ninth column is the response variable, Compressive Strength. In the course of his experiments, Professor Yeh repeatedly tested different formulations, measuring compressive strength after varying numbers of days. To see the structure of the data, let’s look more closely at the two columns.
8. Choose Analyze ► Distribution.
9. As shown in Figure 2.5, select the Cement and Age columns and click OK.
Figure 2.5: The Distribution Dialog Box
The Distribution platform generates two graphs and several statistics. We will study these in detail in Chapter 3. For now, you just need to know that the graphs, called histograms, display the variation within the two data columns. The lengths of the bars indicate the number of rows corresponding to each value. For example, there are many rows with concrete mixtures containing about 150 kg of cement, and very few with 100 kg.
10. Move your cursor over the Cement histogram and click the bar corresponding to values just below 400 kg of cement.
When you click that bar, the entire bar darkens, indicating that the rows corresponding to mixtures with that quantity (380 kg of cement, it turns out) are selected. Additionally, small portions of several bars in the Age histogram are also darkened, representing the same rows.
11. To see the relationships between a histogram bar, the data table, and the other histogram, we want to make the Distribution Report and Data Table visible at the same time. Move your cursor over the tab labeled Concrete – Distribution of Cement, Age. Click and drag downward; release the button over the Dock Above zone. The windows should look like Figure 2.6. Save your project once more.
Figure 2.6: Select Rows by Selecting One Bar in a Graph
Finally, look at the data table. Within the data grid, several visible rows are highlighted. All of them share the same value of Cement. Within the Rows panel, we see that we have now altered the row state of 76 rows by selecting them.
Look more closely at rows 7 and 8, which are the first two of the selected rows. Both represent seven-part mixtures following the same recipe of the seven materials. Observation #7 was taken at 365 days and observation #8 at 28 days. These observations are not listed chronologically, but rather are a randomized sequence typical of an experimental data table.
Row 16 is the next selected row. This formulation shares several ingredients in the same proportion as rows 7 and 8, but the amounts of other ingredients differ. This is also typical of an experiment. The researcher selected and tested different formulations. Because the investigator, Professor Yeh, manipulated the factor values, we find this type of repetition within the data table. And because Professor Yeh was able to select the factor values deliberately and in a controlled way, he was able to draw conclusions about which mixtures will yield the best compressive strength.
12. Before proceeding to the next section, restore the Distribution report to its original position. Move your cursor over the title bar Concrete – Distribution of Cement, Age. Click, drag, and release near the center of the screen over the Dock Tab drop zone.
Observational Data—An Example
Of course, experimentation is not always practical, ethical, or legal. Medical and pharmaceutical researchers must follow extensive regulations, can experiment with dosages and formulations of medications, and may randomly assign patients to treatment groups, but cannot randomly expose patients to diseases. Investors do not have the ability to manipulate stock prices (if they do and get caught, they go to prison).
Now open the data table called Stock Index Weekly. This data table contains time series data for six different major international stock markets during 2008, the year in which a worldwide financial crisis occurred. A stock index is a weighted average of the prices of a sample of publicly traded companies. In this table, we find the weekly index values for the following stock market indexes, as well as the average number of shares traded per week:
● Nikkei 225 Index – Tokyo
● FTSE100 Index – London
● S&P 500 Index – New York
● Hang Seng Index – Hong Kong
● IGBM Index – Madrid
● TA100 Index – Tel Aviv
The first column is the observation date. Note that the observations are basically every seven days after the second week. The other columns simply record the index and volume values as they occurred. It was not possible to set or otherwise control any of these values.
Survey Data—An Example
Survey research is conducted in the social sciences, in public health, and in business. In a survey, researchers pose carefully constructed questions to respondents, and then the researchers or the respondents