Fundamentals of Programming in SAS. James Blum

Чтение книги онлайн.

Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 28

Автор:
Жанр:
Серия:
Издательство:
Fundamentals of Programming in SAS - James Blum

Скачать книгу

a forward slash

      2. DLM = ‘, ‘ causes SAS to move to a new field when it encounters a comma

      3. DLM = ‘,/’ causes SAS to move to a new field when it encounters either a comma or forward slash

      Introduction to Variable Attributes

      In SAS, the amount of memory allocated to a variable is called the variable’s length; length is one of several attributes that each variable possesses. Other attributes include the name of the variable, its position in the data set (1st column, 2nd column, ...), and its type (character or numeric). As with all the variable attributes, the length is set either by use of a default value or by explicitly setting a value.

      By default, both numeric and character variables have a length of eight bytes. For character variables, one byte of memory can hold one character in the English language. Thus, the DATA step truncates several values of State, City, and MortgageStatus from Input Data 2.8.4 since they exceed the default length of eight bytes. For numeric variables, the default length of eight bytes is sufficient to store up to 16 decimal digits (commonly known as double-precision). When using the Microsoft Windows® operating system, numeric variables have a minimum allowable length of three bytes and a maximum length of eight bytes. Character variables may have a minimum length of 1 byte and a maximum length of 32,767 bytes. While there are many options and statements that affect the length of a variable implicitly, the LENGTH statement allows for explicit declaration of the length and type attributes for any variables. Program 2.8.5 demonstrates the usage of the LENGTH statement.

      Program 2.8.5: Using the LENGTH Statement

      data work.Ipums2005Basic;

      length state $ 20 City$ 25 MortgageStatus$50; 

      infile RawData(“IPUMS2005basic.csv”) dsd;

      input Serial State City  CityPop Metro

      CountyFIPS Ownership $  MortgageStatus$ 

      MortgagePayment HHIncome HomeValue;

      run;

      proc print data = work.Ipums2005Basic(obs = 5);

      run;

       The LENGTH statement sets the lengths of State, City, and MortgageStatus to 20, 25, and 50 characters, respectively, with the dollar sign indicating these are character variables. Separating the dollar sign from the variable name or length value is optional, though good programming practices dictate using a consistent style to improve readability.

       Type (character or numeric) is an attribute that cannot be changed in the DATA step once it has been established. Because the LENGTH statement sets these variables as character, the dollar sign is optional in the INPUT statement. However, good programming practices generally dictate including it for readability and so that removal of the LENGTH statement does not lead to a data type mismatch. (This would be an execution-time error.)

       As in , the spacing between the dollar sign and variable name is optional in the INPUT statement as well. Good programming practices still dictate selecting a consistent spacing style.

      Output 2.8.5 shows the results of explicitly setting the length of the State, City, and MortgageStatus variables. In addition to the lengths of these three variables changing, their column position in the SAS data set has changed as well. Variables are added to the data set based on the order they are encountered during compilation of the DATA step, so since the LENGTH statement precedes the INPUT statement, it has actually changed two attributes—length and position—for these three variables (while also defining the type attribute as character).

      Output 2.8.5: Using the LENGTH Statement (Partial Listing)

ObsstateCityMortgageStatusSerialCityPopMetroCountyFIPS
1AlabamaNot in identifiable cityN/A20473
2AlabamaNot in identifiable cityN/A3010
3AlabamaNot in identifiable cityYes, mortgaged/ deed of trust or similar debt40473
4AlabamaNot in identifiable cityN/A5010
5AlabamaNot in identifiable cityNo, owned free and clear60397
ObsOwnershipMortgagePaymentHHIncomeHomeValue
1Rented0120009999999
2Rented0178009999999
3Owned900185000137500
4Rented020009999999
5Owned07260095000

      Like the type attribute, SAS does not allow the position and length attributes to change after their initial values are set. Attempting to change the length attribute after the INPUT statement, as shown in Program 2.8.6, results in a warning in the Log.

      Program 2.8.6: Using the LENGTH Statement After the INPUT Statement

      data work.Ipums2005Basic;

      infile RawData(“IPUMS2005basic.csv”) dsd;

      input Serial State $ City $ CityPop Metro

      CountyFIPS Ownership $ MortgageStatus $

      MortgagePayment HHIncome HomeValue;

      length state $20 City $25 MortgageStatus $50;

      run;

      Log 2.8.6: Warning Generated by Attempting to Reset Length

      WARNING: Length of character variable State has already been set. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable.

      Tab-Delimited Files

      If the delimiter is not a standard keyboard character, such as the tab used in tab-delimited files, an alternate method is used to specify the delimiter via its hexadecimal code. While the correct hexadecimal representation depends on the operating system, Microsoft Windows and Unix/Linux machines typically use ASCII codes. The ASCII hexadecimal code for a tab is 09 and is written in the SAS language as ‘09 ‘x; the x appended to the literal value of 09 instructs the compiler to make the conversion from hexadecimal. Program 2.8.7 uses hexadecimal encoding in the DLM= option to correctly set the delimiter to a tab. The results of Program 2.8.7 are identical to those of Program 2.8.5.

      Program 2.8.7: Reading Tab-Delimited Data

      data work.Ipums2005Basic;

      length state $ 20 City $ 25 MortgageStatus $ 50;

      infile RawData (‘ipums2005basic.txt’) dlm = ‘09’x;

      input Serial State $ City $ CityPop Metro

      CountyFIPS Ownership $ MortgageStatus $

      MortgagePayment HHIncome HomeValue;

      run;

      Because there are no missing values denoted by sequential tabs, nor any tabs included in data values, the DSD option is no longer needed in the INFILE statement for this program.

      To

Скачать книгу