Fundamentals of Programming in SAS. James Blum
Чтение книги онлайн.
Читать онлайн книгу Fundamentals of Programming in SAS - James Blum страница 34
![Fundamentals of Programming in SAS - James Blum Fundamentals of Programming in SAS - James Blum](/cover_pre687444.jpg)
Program 2.10.1: Comparing the Contents of Two Data Sets
proc compare base = sashelp.fish compare = sashelp.heart;
run;
If no statements beyond those shown in Program 2.10.1 are included, the complete contents portions of the two data sets are compared, along with meta-information such as types, lengths, labels, and formats. PROC COMPARE only compares attributes and values for variables with common names across the two data sets. Thus, even though the full data sets are specified for comparison, it is possible for individual variables in one data set to not be compared against any variable in the other set. Program 2.10.1 compares two data sets from the Sashelp library: Fish and Heart. These data sets are not intended to match, so submitting Program 2.10.1 produces a summary of the mismatches to demonstrate the types of output available from the COMPARE procedure.
Output 2.10.1: Comparing the Contents of Two Data Sets
While the output in this text is normally delivered in an RTF format using the Output Delivery System, the output from PROC COMPARE is not well-suited to such an environment, so Output 2.10.1 shows the results as they appear in the Output window in the SAS Windowing Environment. Regardless of the destination, the output from this example of the COMPARE procedure includes sections for the following:
Data set summary—data set names, number of variables, number of observations
Variables summary—number of variables in common, along with the number in each data set which are not found in the other set
Observation summary—location of first/last unequal record and number of matching/nonmatching observations
Values comparison summary—number of variables compared with matches/mismatches, listing of mismatched variables and their differences
A review of the output provided by PROC COMPARE shows, in this case, only two variables are compared, despite the Fish and Heart data sets containing 7 and 17 variables, respectively. This is because only two variables (Weight and Height) have names in common. As such, even if the results indicate the base and comparison data sets have no mismatches, it is important to confirm that all variables were compared before declaring the data sets are identical. Similarly, the number of records compared is the minimum of the number of records in the two data sets, so the number of records must be compared as well. Several options and statements exist to alter how comparisons are done and to direct some comparison information to data sets.
Since the Heart and Fish data sets are not expected to be similar, applying PROC COMPARE to them is a simplistic demonstration of the procedure. A more typical comparison is given in Program 2.10.2, which applies the COMPARE procedure to the data set read in by Program 2.8.8 (using fixed-position data) and the IPUMS2005Basic data set in the BookData library.
Program 2.10.2: Comparing IPUMS 2005 Basic Data Generated from Different Sources
data work.ipums2005basicSubset;
set work.ipums2005basicFPa;
where homeValue ne 9999999;
run;
proc compare base = BookData.ipums2005basic compare = work.ipums2005basicSubset
out = work.diff outbase outcompare outdif outnoequal
method = absolute criterion = 1E-9 ;
run;
proc print data = work.diff(obs=6);
var _type_ _obs_ serial countyfips metro citypop homevalue;
run;
proc print data = work.diff(obs=6);
var _type_ _obs_ city ownership;
run;
To create a data set which differs from the provided BookData.IPUMS2005Basic data set, a WHERE statement is used to remove any homes with a home value of $9,999,999.
OUT= produces a data set containing information about the differences for each pair of compared observations for all matching variables. SAS includes all compared variables and two automatic variables, _TYPE_ and _OBS_.
OUTBASE copies the record being compared in the BASE= data set into the OUT= data set.
Like OUTBASE, OUTCOMPARE copies the record being compared in the COMPARE= data set into the OUT= data set.
OUTDIF produces a record that contains the difference between the OUTBASE and OUTCOMPARE records.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.