Fundamentals of Programming in SAS. James Blum

Чтение книги онлайн.

Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 27

Автор:
Жанр:
Серия:
Издательство:
Fundamentals of Programming in SAS - James Blum

Скачать книгу

work.Utility2001A;

      infile Util2001;

      input Serial$ Electric Gas Water Fuel;

      run;

       The FILENAME statement creates a file reference, called a fileref, named Util2001. Naming conventions for a fileref are the same as those for a libref.

       The path specified, which can be relative or absolute as in Program 2.8.1, includes the file name. SAS assigns the fileref Util2001 to this file.

       The INFILE statement now references the fileref Util2001 rather than the path or file name. Note, quotation marks are not used on Util2001 since it is to be interpreted as a fileref and not a file name or path.

      Program 2.8.3: Associating the FILENAME Statement with a Folder

      filename RawData ‘--insert path to folder here--’; 

      data work.Utility2001B;

      infile RawData(“Utility 2001.prn”);

      input Serial$ Electric Gas Water Fuel;

      run;

       It is assumed here that the path, either relative or absolute, points to a folder and not a specific file. In that case, the FILENAME statement associates a folder with the fileref RawData. The path specified should be to the folder containing the raw files downloaded from the author page, much like the BookData library was assigned to the folder containing the SAS data sets.

       The INFILE statement references both the fileref and the file name. Although the file reference can be made without the quotation marks in certain cases, good programming practice includes the quotation marks.

      Since each of Programs 2.8.2 and 2.8.3 generate the same result as Program 2.8.1 but actually require slightly more code, the benefits of using the FILENAME statement may not be obvious. The form of the FILENAME in Program 2.8.3 is useful if a single file needs to be read repeatedly under different conditions, allowing the multiple references to that file to be shortened. More commonly, the form used in Program 2.8.4 is more efficient when reading multiple files from a common location. Again, if the path specified is to the folder containing the raw files downloaded from the author page, the fileref RawData refers to the location for all non-SAS data sets used in examples for Chapters 2 through 7.

      Input Data 2.8.4 includes a partial representation of the first five records from a comma-delimited file (IPUMS2005Basic.csv). Due to the width of the file, Input Data 2.8.4 truncates the third and fifth records.

      Input Data 2.8.4: Comma Delimited Raw File (Partial Listing)

----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
2,Alabama,Not in identifiable city (or size group),0,4,73,Rented,N/A,0,12000,9999999
3,Alabama,Not in identifiable city (or size group),0,1,0,Rented,N/A,0,17800,9999999
4,Alabama,Not in identifiable city (or size group),0,4,73,Owned,”Yes, mortgaged/ deed
5,Alabama,Not in identifiable city (or size group),0,1,0,Rented,N/A,0,2000,9999999
6,Alabama,Not in identifiable city (or size group),0,3,97,Owned,”No, owned free and

      Not only is this file delimited by commas, but the eighth field on the third and fifth rows also includes data values containing a comma, with those values embedded in quotation marks. (Recall these records are truncated in the text due to their length so the final quote is not shown for these two records.) To successfully read this file, the DATA step must recognize the delimiter as a comma, but also that commas embedded in quoted values are not delimiters. The DSD option is introduced in Program 2.8.4 to read such a file.

      Program 2.8.4: Reading the 2005 Basic IPUMS CPS Data

      data work.Ipums2005Basic;

      infile RawData(“IPUMS2005basic.csv”) dsd;

      input Serial State $ City $ CityPop Metro

      CountyFIPS Ownership $ MortgageStatus $

      MortgagePayment HHIncome HomeValue;

      run;

      proc print data = work.Ipums2005Basic (obs=5);

      run;

       The DSD option included in the INFILE statement modifies the delimiter and some additional default behavior as listed below.

       Again, the INPUT statement names each of the variables read from the raw file in the INFILE statement and sets their types. By default, SAS assumes the incoming variables are numeric; however, State, City, Ownership, and MortgageStatus must be read as character values.

      Output 2.8.4 shows that, while Program 2.8.4 executes successfully, the resulting data set does not correctly represent the values from Input Data 2.8.4—the City and MortgageStatus variables are truncated. This truncation occurs due to the default length of 8 assigned to character variables; therefore, SAS did not allocate enough memory to store the values in their entirety. Only the first five records are shown; however, further investigation reveals this truncation occurs for the variable State as well.

      Output 2.8.4: Reading the 2005 Basic IPUMS CPS Data (Partial Listing).

ObsSerialStateCityCityPopMetroCountyFIPSOwnership
12AlabamaNot in i0473Rented
23AlabamaNot in i010Rented
34AlabamaNot in i0473Owned
45AlabamaNot in i010Rented
56AlabamaNot in i0397Owned
ObsMortgageStatusMortgagePaymentHHIncomeHomeValue
1N/A0120009999999
2N/A0178009999999
3Yes, mor900185000137500
4N/A020009999999
5No, owne07260095000

      Program 2.8.4 uses the DSD option in the INFILE statement to change three default behaviors:

      1. Change the delimiter to comma

      2. Treat two consecutive delimiters as a missing value

      3. Treat delimiters inside quoted strings as part of a character value and strip off the quotation marks

      For Input Data 2.8.4, the first and third actions are necessary to successfully match the structure of the delimiters in the data since (a) the file uses commas as delimiters and (b) commas are included in the quoted strings in the data for the MortgageStatus variable. Because the file does not contain consecutive delimiters, the second modification has no effect.

      Of course, it might be necessary to produce the second and third effects while using blanks—or any other character—as the delimiter. It is also often necessary to change the delimiter without making the other modifications included with the DSD option. In those cases, use the DLM= option to specify one or more delimiters by placing them in a single set of quotation marks, as shown in the following examples.

      1. DLM =

Скачать книгу