Fundamentals of Programming in SAS. James Blum
Чтение книги онлайн.
Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 24
The TABLE statement is not required; however, in that case, the default behavior produces a one-way frequency table for every variable in the data set. Therefore, both types of SAS variables, character or numeric, are legal in the TABLE statement. Given that variables listed in the TABLE statement are treated as categorical (in the same manner as variables listed in the CLASS statement in PROC MEANS), it is best to have the summary variables be categorical or be formatted into a set of categories.
The default summaries in a one-way frequency table are: frequency (count), percent, cumulative frequency, and cumulative percent. Of course, the cumulative statistics only make sense if the categories are ordinal, which these are not. Many options are available in the table statement to control what is displayed, and one is given in Program 2.7.2 to remove the cumulative statistics.
Program 2.7.2: PROC FREQ Option for Removing Cumulative Statistics
proc freq data=BookData.IPUMS2005Basic;
table metro mortgageStatus / nocum;
run;
As with the CLASS statement in the MEANS procedure, variables listed in the TABLE statement in PROC FREQ use the format provided with the variable to construct the categories. Program 2.7.3 uses a format defined in Program 2.5.6 to bin the MortgagePayment variable into categories and, as this is an ordinal set, the cumulative statistics are appropriate.
Program 2.7.3: Using a Format to Control Categories for a Variable in the TABLE Statement
proc format;
value Mort
0=’None’
1-350=”$350 and Below”
351-1000=”$351 to $1000”
1001-1600=”$1001 to $1600”
1601-high=”Over $1600”
;
run;
proc freq data=BookData.IPUMS2005Basic;
table MortgagePayment;
format MortgagePayment Mort.;
run;
Output 2.7.3: Using a Format to Control Categories for a Variable in the TABLE Statement
First mortgage monthly payment | ||||
MortgagePayment | Frequency | Percent | CumulativeFrequency | CumulativePercent |
None | 603691 | 52.08 | 603691 | 52.08 |
$350 and Below | 59856 | 5.16 | 663547 | 57.25 |
$351 to $1000 | 283111 | 24.43 | 946658 | 81.67 |
$1001 to $1600 | 128801 | 11.11 | 1075459 | 92.79 |
Over $1600 | 83603 | 7.21 | 1159062 | 100.00 |
The FREQ procedure is not limited to one-way frequencies—special operators between variables in the TABLE statement allow for construction of multi-way tables.
2.7.2 Multi-Way Tables in PROC FREQ
The * operator constructs cross-tabular summaries for two categorical variables, which includes the following statistics:
cross-tabular and marginal frequencies
cross-tabular and marginal percentages
conditional percentages within each row and column
Program 2.7.4 summarizes all combinations of Metro and MortgagePayment, with Metro formatted to add detail and MortgagePayment formatted into the bins used in the previous example.
Program 2.7.4: Using the * Operator to Create a Cross-Tabular Summary with PROC FREQ
proc format;
value METRO
0 = “Not Identifiable”
1 = “Not in Metro Area”
2 = “Metro, Inside City”
3 = “Metro, Outside City”
4 = “Metro, City Status Unknown”
;
value Mort
0=’None’
1-350=”$350 and Below”
351-1000=”$351 to $1000”
1001-1600=”$1001 to $1600”
1601-high=”Over $1600”
;
run;
proc freq data=BookData.IPUMS2005Basic;
table Metro*MortgagePayment;
format Metro Metro. MortgagePayment Mort.;
run;
The first variable listed in any request of the form A*B is placed on the rows in the table. Requesting MortgagePayment*Metro transposes the table and the included summary statistics.
The format applied to the Metro variable is merely a change in display and has no effect on the structure of the table—it is five rows with or without the format. The format on MortgagePayment is essential to the column structure—allowing each unique value of MortgagePayment to form a column does not produce a useful summary table.
Output 2.7.4: Using the * Operator to Create a Cross-Tabular Summary with PROC FREQ
Table of METRO by MortgagePayment | ||||||
METRO(Metropolitan status) | MortgagePayment(First mortgage monthly payment) | |||||
FrequencyPercentRow PctCol Pct | None | $350 and Below | $351 to $1000 | $1001 to $1600 | Over $1600 | Total |
Not Identifiable | 493794.2653.668.18 | 69790.607.5811.66 | 254882.2027.709.00 | 73070.637.945.67 | 28750.253.123.44 | 920287.94 |
Not in Metro Area | 13431411.5958.2022.25 | 216981.879.4036.25 | 609485.2626.4121.53 | 104640.904.538.12 | 33510.291.454.01 | 23077519.91 |
Metro, Inside City | 964878.3262.5015.98 | 44100.382.867.37 | 288662.4918.7010.20 | 140491.219.1010.91 | 105560.916.8412.63 | 15436813.32 |
Metro, Outside City | 14996112.9443.9824.84 | 121481.053.5620.30 | 793886.8523.2828.04 | 563304.8616.5243.73 | 431553.7212.6651.62 | 34098229.42 |
Metro, City Status Unknown | 17355014.9750.9128.75 | 146211.264.2924.43 | 884217.6325.9431.23 | 406513.5111.9231.56 | 236662.046.9428.31 | 34090929.41 |
Total | 60369152.08 | 598565.16 | 28311124.43 | 12880111.11 | 836037.21 | 1159062100.00 |
Various options are available to control the displayed statistics. Program 2.7.5 illustrates some of these with the result shown in Output 2.7.5.
Program 2.7.5: Using Options in the TABLE Statement.