Fundamentals of Programming in SAS. James Blum

Чтение книги онлайн.

Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 17

Автор:
Жанр:
Серия:
Издательство:
Fundamentals of Programming in SAS - James Blum

Скачать книгу

      Output 2.3.3: Assigning Formats

StateMortgageStatusMortgagePaymentValue of HomeMETRO
South CarolinaY$200$32,5004
North CarolinaN$0$5,0001
South CarolinaY$360$75,0004
South CarolinaY$430$22,5003
North CarolinaY$450$65,0004

      Rows in a data set can be reordered using the SORT procedure to sort the data on the values of one or more variables in ascending or descending order. Program 2.3.4 sorts the BookData.Ipums2005Mini data set by the HomeValue variable.

      Program 2.3.4: Sorting Data with the SORT Procedure

      proc sort data=bookdata.ipums2005mini out=work.sorted;

      by HomeValue;

      run;

      proc print data=work.sorted(obs=5) noobs label;

      var state MortgageStatus MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $1.;

      run;

       The default behavior of the SORT procedure is to replace the input data set, specified in the DATA= option, with the sorted data set. To create a new data set from the sorted observations, use the OUT= option.

       The BY statement is required in PROC SORT and must name at least one variable. As shown in Output 2.3.4, the rows are now ordered in increasing levels of HomeValue.

      Output 2.3.4: Sorting Data with the SORT Procedure

StateMortgageStatusMortgagePaymentValue of HomeMETRO
North CarolinaN$0$5,0001
South CarolinaY$430$22,5003
North CarolinaY$300$22,5003
South CarolinaY$200$32,5004
North CarolinaN$0$45,0001

      Sorting on more than one variable gives a nested or hierarchical sorting. In those cases, values are ordered on the first variable, then for groups of records having the same value of the first variable those records are sorted on the second variable, and so forth. A specification of ascending (the default) or descending order is made for each variable. Program 2.3.5 sorts the BookData.Ipums2005Mini data set on three variables present in the data set.

      Program 2.3.5: Sorting on Multiple Variables

      proc sort data=bookdata.ipums2005mini out=work.sorted;

      by MortgagePayment descending State descending HomeValue;

      run;

      proc print data=work.sorted(obs=6) noobs label;

      var state MortgageStatus MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $1.;

      run;

       The first sort is on MortgagePayment, in ascending order. Since 0 is the lowest value and that value occurs on six records in the data set, Output 2.3.5 shows one block of records with MortgagePayment 0.

       The next sort is on State in descending order—note that the DESCENDING option precedes the variable it applies to. For the six records shown in Output 2.3.5, the first three are South Carolina and the final three are North Carolina—descending alphabetical order. Note, when sorting character data, casing matters—uppercase values are before lowercase in such a sort. For more details about determining the sort order of character data, see Chapter Note 2 in Section 2.12.

       The final sort is on HomeValue, also in descending order—note that the DESCENDING option must precede each variable it applies to. So, within each State group in Output 2.3.5, values of the HomeValue variable are in descending order.

      Output 2.3.5: Sorting on Multiple Variables

StateMortgageStatusMortgagePaymentValue of HomeMETRO
South CarolinaN$0$137,5003
South CarolinaN$0$95,0004
South CarolinaN$0$45,0003
North CarolinaN$0$162,5000
North CarolinaN$0$45,0001
North CarolinaN$0$5,0001

      Most SAS procedures, including PROC PRINT, can take advantage of BY-group processing for data that is sorted into groups. The procedure must use a BY statement that corresponds to the sorting in the data set. If the data is sorted using PROC SORT, the BY statement in a subsequent procedure does not have to completely match the BY statement in PROC SORT; however, it must match the first level of sorting if only one variable is included, the first two levels if two variables are included, and so forth. It must also match ordering, ascending or descending, on each included variable. Program 2.3.6 groups output from the PRINT procedure based on BY grouping constructed with PROC SORT.

      Program 2.3.6: BY-Group Processing in PROC PRINT

      proc sort data=bookdata.ipums2005mini out= work.sorted;

      by MortgageStatus State descending HomeValue;

      run;

      proc print data= work.sorted noobs label;

      by MortgageStatus State;

      var MortgagePayment HomeValue Metro;

      label HomeValue=’Value of Home’ state=’State’;

      format HomeValue MortgagePayment dollar9. MortgageStatus $9.;

      run;

       The original data is sorted first on MortgageStatus, then on State, and finally in descending order of HomeValue for each combination of MortgageStatus and State.

       PROC PRINT uses a BY statement matching on the MortgageStatus and State variables, which groups the output into sections based on each unique combination of values for these two variables, with the final sorting on HomeValue appearing in each table. Note that a BY statement with only MortgageStatus can be used as well, but a BY statement with only State cannot—the data is not sorted on State primarily.

      Output 2.3.6: BY-Group Processing in PROC PRINT (First 2 of 6 Groups Shown)

      MortgageStatus=No, owned State=North Carolina

MortgagePaymentValue of HomeMETRO
$0$162,5000
$0$45,0001
$0$5,0001

      MortgageStatus=No, owned State=South Carolina

MortgagePaymentValue of HomeMETRO
$0$137,5003
$0$95,0004
$0$45,0003

Скачать книгу