Fundamentals of Programming in SAS. James Blum
Чтение книги онлайн.
Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 17
![Fundamentals of Programming in SAS - James Blum Fundamentals of Programming in SAS - James Blum](/cover_pre687444.jpg)
Output 2.3.3: Assigning Formats
State | MortgageStatus | MortgagePayment | Value of Home | METRO |
South Carolina | Y | $200 | $32,500 | 4 |
North Carolina | N | $0 | $5,000 | 1 |
South Carolina | Y | $360 | $75,000 | 4 |
South Carolina | Y | $430 | $22,500 | 3 |
North Carolina | Y | $450 | $65,000 | 4 |
2.3.2 PROC SORT and BY-Group Processing
Rows in a data set can be reordered using the SORT procedure to sort the data on the values of one or more variables in ascending or descending order. Program 2.3.4 sorts the BookData.Ipums2005Mini data set by the HomeValue variable.
Program 2.3.4: Sorting Data with the SORT Procedure
proc sort data=bookdata.ipums2005mini out=work.sorted;
by HomeValue;
run;
proc print data=work.sorted(obs=5) noobs label;
var state MortgageStatus MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $1.;
run;
The default behavior of the SORT procedure is to replace the input data set, specified in the DATA= option, with the sorted data set. To create a new data set from the sorted observations, use the OUT= option.
The BY statement is required in PROC SORT and must name at least one variable. As shown in Output 2.3.4, the rows are now ordered in increasing levels of HomeValue.
Output 2.3.4: Sorting Data with the SORT Procedure
State | MortgageStatus | MortgagePayment | Value of Home | METRO |
North Carolina | N | $0 | $5,000 | 1 |
South Carolina | Y | $430 | $22,500 | 3 |
North Carolina | Y | $300 | $22,500 | 3 |
South Carolina | Y | $200 | $32,500 | 4 |
North Carolina | N | $0 | $45,000 | 1 |
Sorting on more than one variable gives a nested or hierarchical sorting. In those cases, values are ordered on the first variable, then for groups of records having the same value of the first variable those records are sorted on the second variable, and so forth. A specification of ascending (the default) or descending order is made for each variable. Program 2.3.5 sorts the BookData.Ipums2005Mini data set on three variables present in the data set.
Program 2.3.5: Sorting on Multiple Variables
proc sort data=bookdata.ipums2005mini out=work.sorted;
by MortgagePayment descending State descending HomeValue;
run;
proc print data=work.sorted(obs=6) noobs label;
var state MortgageStatus MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $1.;
run;
The first sort is on MortgagePayment, in ascending order. Since 0 is the lowest value and that value occurs on six records in the data set, Output 2.3.5 shows one block of records with MortgagePayment 0.
The next sort is on State in descending order—note that the DESCENDING option precedes the variable it applies to. For the six records shown in Output 2.3.5, the first three are South Carolina and the final three are North Carolina—descending alphabetical order. Note, when sorting character data, casing matters—uppercase values are before lowercase in such a sort. For more details about determining the sort order of character data, see Chapter Note 2 in Section 2.12.
The final sort is on HomeValue, also in descending order—note that the DESCENDING option must precede each variable it applies to. So, within each State group in Output 2.3.5, values of the HomeValue variable are in descending order.
Output 2.3.5: Sorting on Multiple Variables
State | MortgageStatus | MortgagePayment | Value of Home | METRO |
South Carolina | N | $0 | $137,500 | 3 |
South Carolina | N | $0 | $95,000 | 4 |
South Carolina | N | $0 | $45,000 | 3 |
North Carolina | N | $0 | $162,500 | 0 |
North Carolina | N | $0 | $45,000 | 1 |
North Carolina | N | $0 | $5,000 | 1 |
Most SAS procedures, including PROC PRINT, can take advantage of BY-group processing for data that is sorted into groups. The procedure must use a BY statement that corresponds to the sorting in the data set. If the data is sorted using PROC SORT, the BY statement in a subsequent procedure does not have to completely match the BY statement in PROC SORT; however, it must match the first level of sorting if only one variable is included, the first two levels if two variables are included, and so forth. It must also match ordering, ascending or descending, on each included variable. Program 2.3.6 groups output from the PRINT procedure based on BY grouping constructed with PROC SORT.
Program 2.3.6: BY-Group Processing in PROC PRINT
proc sort data=bookdata.ipums2005mini out= work.sorted;
by MortgageStatus State descending HomeValue;
run;
proc print data= work.sorted noobs label;
by MortgageStatus State;
var MortgagePayment HomeValue Metro;
label HomeValue=’Value of Home’ state=’State’;
format HomeValue MortgagePayment dollar9. MortgageStatus $9.;
run;
The original data is sorted first on MortgageStatus, then on State, and finally in descending order of HomeValue for each combination of MortgageStatus and State.
PROC PRINT uses a BY statement matching on the MortgageStatus and State variables, which groups the output into sections based on each unique combination of values for these two variables, with the final sorting on HomeValue appearing in each table. Note that a BY statement with only MortgageStatus can be used as well, but a BY statement with only State cannot—the data is not sorted on State primarily.
Output 2.3.6: BY-Group Processing in PROC PRINT (First 2 of 6 Groups Shown)
MortgageStatus=No, owned State=North Carolina
MortgagePayment | Value of Home | METRO |
$0 | $162,500 | 0 |
$0 | $45,000 | 1 |
$0 | $5,000 | 1 |
MortgageStatus=No, owned State=South Carolina
MortgagePayment | Value of Home | METRO |
$0 | $137,500 | 3 |
$0 | $95,000 | 4 |
$0 | $45,000 | 3 |