Fundamentals of Programming in SAS. James Blum
Чтение книги онлайн.
Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 23
![Fundamentals of Programming in SAS - James Blum Fundamentals of Programming in SAS - James Blum](/cover_pre687444.jpg)
BETWEEN-AND allows for simplification of a value range that can otherwise be written using AND between appropriate comparisons, as was done in .
The NOT operator allows the truth condition to be made the opposite of what is specified. This is a slight improvement over , as the list of values not desired is shorter than the list of those that are.
Adding any of these WHERE statements (or any other logically equivalent WHERE statement) to Program 2.5.2 produces the results shown in Table 2.6.3.
Table 2.6.3: Using WHERE to Subset Results to Specific Values of the Metro Variable
Analysis Variable : HHIncome | |||||
METRO | N | Mean | Std Dev | Minimum | Maximum |
Metro, Inside City | 154368 | 60328 | 70874 | -19998 | 1391000 |
Metro, Outside City | 340982 | 77648 | 75907 | -29997 | 1739770 |
Metro, City Status Unknown | 340909 | 64335 | 66110 | -22298 | 1536000 |
The tools available allow for conditioning on more than one variable, and the variable(s) conditioned on need only be in the data set in use and do not have to be present in the output generated. In Program 2.6.1, the output is conditioned additionally on households known to have an outstanding mortgage.
Program 2.6.1: Conditioning on a Variable Not Used in the Analysis
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4)
and
MortgageStatus in
(‘Yes, contract to purchase’,
‘Yes, mortgaged/ deed of trust or similar debt’);
run;
Output 2.6.1: Conditioning on a Variable Not Used in the Analysis
Analysis Variable : HHIncome | |||||
METRO | N | Mean | Std Dev | Minimum | Maximum |
Metro, Inside City | 57881 | 86277 | 82749 | -19998 | 1361000 |
Metro, Outside City | 191021 | 96319 | 80292 | -29997 | 1266000 |
Metro, City Status Unknown | 167359 | 83879 | 72010 | -19998 | 1407000 |
The condition on the MortgageStatus variable is a bit daunting, particularly noting that matching character values is a precise operation. Seemingly simple differences like casing or spacing lead to values that are non-matching. Therefore, the literals used in Program 2.6.1 are specified to be an exact match for the data. In Section 3.9, functions are introduced that are useful in creating consistency among character values, along with others that allow for extraction and use of relevant portions of a string. However, the WHERE statement provides some special operators, shown in Table 2.6.4, that allow for simplification in these types of cases without the need to intervene with a function.
Table 2.6.4: Operators for General Comparisons
Symbol | Mnemonic | Logic |
? | CONTAINS | True result if the specified value is contained in the data value (character only). |
LIKE | True result if data value matches the specified value which may include wildcards. _ is any single character, % is any set of characters. |
Program 2.6.2 offers two methods for simplifying the condition on MortgageStatus, one using CONTAINS, the other using LIKE. Either reproduces Output 2.6.1.
Program 2.6.2: Conditioning on a Variable Using General Comparison Operators
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4) and MortgageStatus contains ’Yes’;
run;
proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;
class Metro;
var HHIncome;
format Metro Metro.;
where Metro in (2,3,4) and MortgageStatus like ’%Yes%’;
run;
CONTAINS checks to see if the data value contains the string Yes; again, note that the casing must be correct to ensure a match. Also, ensure single or double quotation marks enclose the value to search for—in this case, without the quotation marks, Yes forms a legal variable name and is interpreted by the compiler as a reference to a variable.
LIKE allows for the use of wildcards as substitutes for non-essential character values. Here the % wildcard before and after Yes results in a true condition if Yes appears anywhere in the string and is thus logically equivalent to the CONTAINS conditioning above.
2.7 Using the FREQ Procedure for Categorical Summaries
To produce tables of frequencies and relative frequencies (percentages) like those shown for the case study in Outputs 2.2.3 and 2.2.4, the FREQ procedure is the tool of choice, and this section covers its fundamentals.
2.7.1 Choosing Analysis Variables in PROC FREQ
As in previous sections, the examples here use the IPUMS2005Basic SAS data set, so make sure the BookData library is assigned. As a first step, enter and submit Program 2.7.1. (Note that the use of labels has been re-established in the OPTIONS statement.)
Program 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement
options label;
proc freq data=BookData.IPUMS2005Basic;
table metro mortgageStatus;
run;
The TABLE statement allows for specification of the variables to summarize, and a space-delimited list of variables produces a one-way frequency table for each, as shown in Output 2.7.1.
Output 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement
Metropolitan status | ||||
METRO | Frequency | Percent | CumulativeFrequency | CumulativePercent |
0 | 92028 | 7.94 | 92028 | 7.94 |
1 | 230775 | 19.91 | 322803 | 27.85 |
2 | 154368 | 13.32 | 477171 | 41.17 |
3 | 340982 | 29.42 | 818153 | 70.59 |
4 | 340909 | 29.41 | 1159062 | 100.00 |
MortgageStatus | Frequency | Percent | CumulativeFrequency | CumulativePercent |
N/A | 303342 | 26.17 | 303342 | 26.17 |
No, owned free and clear | 300349 | 25.91 | 603691 | 52.08 |
Yes, contract to purchase | 9756 | 0.84 | 613447 | 52.93 |
Yes,
|