SAS Statistics by Example. Ron Cody, EdD

Чтение книги онлайн.

Читать онлайн книгу SAS Statistics by Example - Ron Cody, EdD страница 11

Автор:
Жанр:
Серия:
Издательство:
SAS Statistics by Example - Ron Cody, EdD

Скачать книгу

value (0% Min), first quartile (25% Q1), median (50% Median), third quartile (75% Q3), and the maximum value (100% Max). If you supply PROC UNIVARIATE with some options, it can compute quantiles for any values you want, and write these values to a SAS data set.

Image537.png

      Circle5.png This section displays the five lowest and five highest values in your data set. You can quickly check the listed values to ensure that no values are dramatically different from what you expected (perhaps a data entry error occurred).

      Because you used an ID statement, this portion of the output includes the Subj variable. The column labeled Obs is the observation number (which is not very useful because adding observations or sorting the data set will change the observation number). If you want to see more than the five lowest and five highest values, you can supply a procedure option NEXTROBS=n (number of extreme observations) to ask PROC UNIVARIATE to list any number of extreme observations.

      Circle6.png This section tells you how many observations had a missing value for the variable of interest. It also expresses this number as the percent of all your observations.

      The HISTOGRAM and PROBPLOT statement both produce high quality SAS/GRAPH output. Depending on your system, these plots are either displayed immediately in your output window, or you need to click on the task bar at the bottom of your screen to see them. The following graph is the result of the HISTOGRAM statement:

Image545.png

      The x-axis shows ranges of SBP. The numbers that are displayed are the midpoints of the SBP ranges. The y-axis displays the percentage of values that fall within these ranges. In the next section, you will learn how to change these data ranges, but the values that SAS chooses for you are usually fine for a quick idea of what your distribution looks like. In this example, the SBP values look similar to those from a normal distribution.

      The PROBPLOT statement produced the next graph:

Image552.png

      If your values came from a normal distribution, they would fall close to the diagonal line on the plot. In this example, the actual data points do not deviate much from this theoretical line, showing that the values of SBP come from a distribution that is close to normal. This outcome is also consistent with the values for skewness and kurtosis that you saw earlier.

      If you want to change the midpoint values displayed on the histogram, you can supply a MIDPOINTS option on the HISTOGRAM statement. For example, if you want midpoints to go from 100 to 170 with each bin representing 5 points, you would write:

      histogram / midpoints=100 to 170 by 5;

      The following histogram used the MIDPOINTS option set to 100 to 170 by 5:

Image560.png

      Finally, you could also see a theoretical normal curve superimposed on your histogram by including the NORMAL option on the HISTOGRAM statement like this:

      histogram / midpoints=100 to 170 by 5 normal;

      The output now shows a normal curve superimposed on your histogram:

Image567.png

      SAS 9.2 introduced several important and useful statistical graphics procedures. Among the more useful of these are SGPLOT and SGSCATTER. You can use SGPLOT to produce histograms, box plots, scatter plots, and much more. SGSCATTER displays several plots on a single page (including a scatter plot matrix that is particularly useful). The SG procedures come with a number of built-in styles. You can select different styles for your output without having to do any programming.

      Let’s see how to produce a histogram and a box plot using SGPLOT.

title “Using SGPLOT to Produce a Histogram”; proc sgplot data=example.Blood_Pressure; histogram SBP; run;

      This HISTOGRAM statement produces a histogram, similar in appearance to the histogram you obtained with the HISTOGRAM statement on PROC UNIVARIATE. As you will learn later, you can change the appearance of the output when you select alternate output destinations such as HTML, PDF, and RTF (rich text format), and one of the built-in styles.

      First, let’s see how to display the plot. Then you will learn a few of the more popular options that control the appearance of the output.

      Output from the SG procedures does not usually open automatically after you run the procedure. One way to examine the output is to go to the Results window in SAS Display Manager:

Image575.png

      You see the output from SGPLOT with a plus sign (+) to the left of it. Click the plus sign to expand the list:

Image582.png

      Now double click on the SGPlot Procedure icon to display the histogram. You can use this sequence of steps to display any of the graphs produced by the SG procedures or to display the plots produced by ODS Statistical Graphics that you will see later in this book.

      Finally, after all this clicking, you will see your histogram:

Image590.png

      To produce a box-plot of the same data, use the HBOX statement (horizontal box plot) instead of the request for a histogram:

title “Using SGPLOT to Produce a Box Plot”; proc sgplot data=example.Blood_Pressure; hbox SBP; run;

      Click your way through the Results window to see the following display:

Image599.png

      The left and right sides of the box represent the 1st and 3rd quartiles (sometimes abbreviated Q1 and Q3). The vertical bar inside the box is the median, and the diamond represents the mean. The lines extending from the left and right side of the box (called whiskers) represent data values that are less than 1.5 times the interquartile range from Q1 and Q3. If you prefer to see a vertical box plot, use the keyword VBOX instead of HBOX.

      To see the effect of outliers on a box plot, let’s modify two SBP values for subjects 5 and 55 to be 200 and 180, respectively. This modified data set is called Blood_Pressure_Out and is stored in the Work library (making it a temporary SAS data set). You can see the program to create this data set, as well as the request for the box plot, in Program 2.8:

*Program

Скачать книгу