Fundamentals of Programming in SAS. James Blum

Чтение книги онлайн.

Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 23

Автор:
Жанр:
Серия:
Издательство:
Fundamentals of Programming in SAS - James Blum

Скачать книгу

be written using the OR operator, as was done in . The list is given as a set of values separated by commas or spaces and enclosed in parentheses.

       BETWEEN-AND allows for simplification of a value range that can otherwise be written using AND between appropriate comparisons, as was done in .

       The NOT operator allows the truth condition to be made the opposite of what is specified. This is a slight improvement over , as the list of values not desired is shorter than the list of those that are.

      Adding any of these WHERE statements (or any other logically equivalent WHERE statement) to Program 2.5.2 produces the results shown in Table 2.6.3.

      Table 2.6.3: Using WHERE to Subset Results to Specific Values of the Metro Variable

Analysis Variable : HHIncome
METRONMeanStd DevMinimumMaximum
Metro, Inside City1543686032870874-199981391000
Metro, Outside City3409827764875907-299971739770
Metro, City Status Unknown3409096433566110-222981536000

      The tools available allow for conditioning on more than one variable, and the variable(s) conditioned on need only be in the data set in use and do not have to be present in the output generated. In Program 2.6.1, the output is conditioned additionally on households known to have an outstanding mortgage.

      Program 2.6.1: Conditioning on a Variable Not Used in the Analysis

      proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;

      class Metro;

      var HHIncome;

      format Metro Metro.;

      where Metro in (2,3,4)

      and

      MortgageStatus in

      (‘Yes, contract to purchase’,

      ‘Yes, mortgaged/ deed of trust or similar debt’);

      run;

      Output 2.6.1: Conditioning on a Variable Not Used in the Analysis

Analysis Variable : HHIncome
METRONMeanStd DevMinimumMaximum
Metro, Inside City578818627782749-199981361000
Metro, Outside City1910219631980292-299971266000
Metro, City Status Unknown1673598387972010-199981407000

      The condition on the MortgageStatus variable is a bit daunting, particularly noting that matching character values is a precise operation. Seemingly simple differences like casing or spacing lead to values that are non-matching. Therefore, the literals used in Program 2.6.1 are specified to be an exact match for the data. In Section 3.9, functions are introduced that are useful in creating consistency among character values, along with others that allow for extraction and use of relevant portions of a string. However, the WHERE statement provides some special operators, shown in Table 2.6.4, that allow for simplification in these types of cases without the need to intervene with a function.

      Table 2.6.4: Operators for General Comparisons

SymbolMnemonicLogic
?CONTAINSTrue result if the specified value is contained in the data value (character only).
LIKETrue result if data value matches the specified value which may include wildcards. _ is any single character, % is any set of characters.

      Program 2.6.2 offers two methods for simplifying the condition on MortgageStatus, one using CONTAINS, the other using LIKE. Either reproduces Output 2.6.1.

      Program 2.6.2: Conditioning on a Variable Using General Comparison Operators

      proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;

      class Metro;

      var HHIncome;

      format Metro Metro.;

      where Metro in (2,3,4) and MortgageStatus contains ’Yes’;

      run;

      proc means data=BookData.IPUMS2005Basic nonobs maxdec=0;

      class Metro;

      var HHIncome;

      format Metro Metro.;

      where Metro in (2,3,4) and MortgageStatus like ’%Yes%’;

      run;

       CONTAINS checks to see if the data value contains the string Yes; again, note that the casing must be correct to ensure a match. Also, ensure single or double quotation marks enclose the value to search for—in this case, without the quotation marks, Yes forms a legal variable name and is interpreted by the compiler as a reference to a variable.

       LIKE allows for the use of wildcards as substitutes for non-essential character values. Here the % wildcard before and after Yes results in a true condition if Yes appears anywhere in the string and is thus logically equivalent to the CONTAINS conditioning above.

      To produce tables of frequencies and relative frequencies (percentages) like those shown for the case study in Outputs 2.2.3 and 2.2.4, the FREQ procedure is the tool of choice, and this section covers its fundamentals.

      As in previous sections, the examples here use the IPUMS2005Basic SAS data set, so make sure the BookData library is assigned. As a first step, enter and submit Program 2.7.1. (Note that the use of labels has been re-established in the OPTIONS statement.)

      Program 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement

      options label;

      proc freq data=BookData.IPUMS2005Basic;

      table metro mortgageStatus;

      run;

      The TABLE statement allows for specification of the variables to summarize, and a space-delimited list of variables produces a one-way frequency table for each, as shown in Output 2.7.1.

      Output 2.7.1: PROC FREQ with Variables Listed Individually in the TABLE Statement

Metropolitan status
METROFrequencyPercentCumulativeFrequencyCumulativePercent
0920287.94920287.94
123077519.9132280327.85
215436813.3247717141.17
334098229.4281815370.59
434090929.411159062100.00
MortgageStatusFrequencyPercentCumulativeFrequencyCumulativePercent
N/A30334226.1730334226.17
No, owned free and clear30034925.9160369152.08
Yes, contract to purchase97560.8461344752.93
Yes,

Скачать книгу