Fundamentals of Programming in SAS. James Blum

Чтение книги онлайн.

Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 18

Автор:
Жанр:
Серия:
Издательство:
Fundamentals of Programming in SAS - James Blum

Скачать книгу

      The structure of BY groups in PROC PRINT can be altered slightly through use of an ID statement, as shown in Program 2.3.7. Assuming the variables listed in the ID statement match those in the BY statement, BY-group variables are placed as the left-most columns of each table, rather than between tables.

      Program 2.3.7: Using BY and ID Statements Together in PROC PRINT

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      id MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

      Output 2.3.7: Using BY and ID Statements Together in PROC PRINT (First 2 of 6 Groups Shown)

MortgageStatusStateMortgagePaymentValue of HomeMETRO
No, ownedNorth Carolina$0$162,5000
$0$45,0001
$0$5,0001

MortgageStatusStateMortgagePaymentValue of HomeMETRO
No, ownedSouth Carolina$0$137,5003
$0$95,0004
$0$45,0003

      PROC PRINT is limited in its ability to do computations. (Later in this text, the REPORT procedure is used to create various summary tables.); however, it can do sums of numeric variables with the SUM statement, as shown in Program 2.3.8.

      Program 2.3.8: Using the SUM Statement in PROC PRINT

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      id MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      sum MortgagePayment HomeValue;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

      Output 2.3.8: Using the SUM Statement in PROC PRINT (Last of 6 Groups Shown)

MortgageStatusStateMortgagePaymentValue of HomeMETRO
Yes, mortSouth Carolina$360$75,0004
$500$65,0003
$200$32,5004
Yes, mortSouth Carolina$1,060$172,500
Yes, mort$2,200$315,000
$4,230$1200000

      Sums are produced at the end of each BY group (and the SUMBY statement is available to modify this behavior), and at the end of the full table. Note that the format applied to the HomeValue column is not sufficient to display the grand total with the dollar sign and comma. If a format is of insufficient width, SAS removes what it determines to be the least important characters. However, it is considered good programming practice to determine the minimum format width needed for all values a format is applied to. If the format does not include sufficient width to display the value with full precision, then SAS may adjust the included format to a different format. See Chapter Note 3 in Section 2.12 for further discussion on format widths.

      Producing tables of statistics like those shown for the case study in Outputs 2.2.1 and 2.2.2 uses MEANS procedure. This section covers the fundamentals of PROC MEANS, including how to select variables for analysis, choosing statistics, and separating analyses across categories.

      To begin, make sure the BookData library is assigned as done in Chapter 1, submit PROC CONTENTS on the IPUMS2005Basic SAS data set from the BookData library, and review the output. Also, to ensure familiarity with the data, open the data set for viewing or run the PRINT procedure to direct it to an output table. Once these steps are complete, enter and submit the code given in Program 2.4.1.

      Program 2.4.1: Default Statistics and Behavior for PROC MEANS

      options nolabel;

      proc means data=BookData.IPUMS2005Basic;

      run;

      For variables that have labels, PROC MEANS includes them as a column in the output table; using NOLABEL in the OPTIONS statement suppresses their use. Here DATA= is technically an option; however, the default data set in any SAS session is the last data set created. If no data sets have been created during the session, which is the most likely scenario currently, PROC MEANS does not have a data set to process unless this option is provided. Beyond having a data set to work with, no other options or statements are required for PROC MEANS to compile and execute successfully. In this case, the default behavior, as shown in Output 2.4.1, is to summarize all numeric variables on a set of five statistics: number of nonmissing observations, mean, standard deviation, minimum, and maximum.

      Output 2.4.1: Default Statistics and Behavior for PROC MEANS

VariableNMeanStd DevMinimumMaximum
SERIALCOUNTYFIPSMETROCITYPOPMortgagePaymentHHIncomeHomeValue1159062115906211590621159062115906211590621159062621592.2442.20629012.52453542916.66500.204263463679.842793526.49359865.4178.95432851.308530212316.27737.988559266295.974294777.182.00000000000-29997.005000.001245246.00810.00000004.000000079561.007900.001739770.009999999.00

      SAS differentiates variable types as numeric and character only; therefore, variables stored as numeric that are not quantitative are summarized even if those summaries do not make sense. Here, the Serial, CountyFIPS, and Metro variables are stored as numbers, but means and standard deviations are of no utility on these since they are nominal. It is, of course, important to understand the true role and level of measurement (for instance, nominal versus ratio) for the variables in the data set being analyzed.

      To select the variables for analysis, the MEANS procedure includes the VAR statement. Any variables listed in the VAR statement must be numeric, but should also be appropriate for quantitative summary statistics. As in the previous example, the summary for each variable is listed in its own row in the output table. (If only one variable is provided, it is named in the header above the table instead of in the first column.) Program 2.4.2 modifies Program 2.4.1 to summarize only the truly quantitative variables from BookData.IPUMS2005Basic, with the results shown in Output 2.4.2.

      Program 2.4.2: Selecting Analysis Variables Using the VAR Statement in MEANS

      proc means data=BookData.IPUMS2005Basic;

      var Citypop MortgagePayment HHIncome HomeValue;

      run;

      Output 2.4.2: Selecting Analysis Variables Using the VAR Statement in MEANS

VariableNMeanStd DevMinimumMaximum
CITYPOPMortgagePaymentHHIncomeHomeValue11590621159062115906211590622916.66500.204263463679.842793526.4912316.27737.988559266295.974294777.1800-29997.005000.0079561.007900.001739770.009999999.00

Скачать книгу