Hello SAS guys,
I usually use proc compare to compare two large data sets. I only care about the no equivalence, so I use outnoequal option. My question is how can I know the variable names in the output data set.
For example, the proc compare output data is the follows:
id v1 v2............v30..............v50
I1 .. .. ............. X ............. X
I2 ......................... X .............. X
I3
.......
......
I67894 ........
How can I know the variable V30 and V50 contians X which means different without opening the output dataset.
Thank you very much
You can use ODS commands to suppress the output.
ods exclude all;
ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar
outnoequal out=dif
;
run;
ods exclude none;
Not sure why you posted a WORD document for TEXT.
Obs type batch 1 h The COMPARE Procedure 2 h Comparison of WORK.CLASS with SASHELP.CLASS 3 h (Method=EXACT) 4 h 5 h Observation Summary 6 h 7 h Observation Base Compare 8 d 9 d First Obs 1 1 10 d First Unequal 1 1 11 d Last Unequal 3 3 12 d Last Obs 19 19 13 d 14 d Number of Observations in Common: 19. 15 d Total Number of Observations Read from WORK.CLASS: 19. 16 d Total Number of Observations Read from SASHELP.CLASS: 19. 17 d 18 d Number of Observations with Some Compared Variables Unequal: 3. 19 d Number of Observations with All Compared Variables Equal: 16. 20 d 21 d 22 h Values Comparison Summary 23 h 24 d Number of Variables Compared with All Observations Equal: 2. 25 d Number of Variables Compared with Some Observations Unequal: 3. 26 d Number of Variables with Missing Value Differences: 1. 27 d Total Number of Values which Compare Unequal: 3. 28 d Maximum Difference: 12. 29 d 30 d 31 h Variables with All Equal Values 32 h 33 h Variable Type Len 34 d 35 d Sex CHAR 1 36 d Height NUM 8 37 d 38 h Variables with Unequal Values 39 h 40 h Variable Type Len Ndif MaxDif MissDif 41 d 42 d Name CHAR 8 1 0 43 d Age NUM 8 1 12.000 0 44 d Weight NUM 8 1 0 1 45 d
Sounds like you actually don't want to generate that dataset.
Please explain more what you want.
The normal report shows which variables are different.
Also the procedure will set a macro variable that you can check individual bits to see what types of differences where found if you just want an overall YES/NO status check.
I do need the compare output dataset, but I don't want to open it to check whether there are some discrepancy in individual variable or which variables have the discrepancy. I prefer to save the variable name as macro then export to excel file. When I open the excel file, I know the variables in two original datasets have discrepancies.
You can use the OUTNOEQUAL option on the PROC COMPARE statement to limit the observations to those with differences.
You can use the ODS output feature to get the summary of variable data which you could use to figure out which variables had any differences.
Here is an example:
data class;
set sashelp.class;
if _n_=1 then name='FRED';
if _n_=2 then age=1;
if _n_=3 then weight=.;
run;
ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar outnoequal out=dif;
run;
%put &=sysinfo -> %sysfunc(putn(&sysinfo,binary16));
data vars;
length Name $32 Type $4 Len Ndif MaxDif MissDif 8;
keep name -- missdif ;
do p=1 by 1 until (found);
set summ ;
found = (batch=:'Variable ');
end;
if found then do p=p+1 to nobs ;
set summ point=p nobs=nobs;
if type='d' then do;
name=scan(batch,1,' ');
type=scan(batch,2,' ');
len=input(scan(batch,3,' '),??32.);
ndif=input(scan(batch,4,' '),??32.);
maxdif=input(scan(batch,5,' '),??32.);
missdif=input(scan(batch,6,' '),??32.);
if not missing(name) then output;
end;
end;
stop;
run;
proc print;
run;
Results:
Max Miss Obs Name Type Len Ndif Dif Dif 1 Sex CHAR 1 . . . 2 Height NUM 8 . . . 3 Name CHAR 8 1 0 . 4 Age NUM 8 1 12 0 5 Weight NUM 8 1 0 1
You can then query that to get a list of variables to include in your report.
%let varlist=;
proc sql noprint;
select name into :varlist separated by ' ' from vars where ndif;
quit;
proc print data=dif(keep=_type_ _obs_ &varlist) ;
run;
Thank you for your reply. I tried running your code and what I got is the empty vars dataset. Is anything wrong with my SAS version?
Check the LOG to see if it generated the ODS output dataset.
If it did then look at the dataset generated (before the data step that tries to pick it apart) and see if it matches the format the data step is expecting. Adjust the data step to read what you see in the result. ODS output has nasty habit of being different based on the different results. For example if you remove the lines in my test program that modify the numeric variables then the report does not include the column with the MAXDIF variable. If there are no missing differences then that column does not exist.
Hello Tom,
The log window shows summ dataset is created there are 46 observations and 2 variables, while the vars dataset has 0 observations. If the two datasets are small, your code : ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar outnoequal out=dif;
run;
may work well. However, for the large datasets with millions observations, I prefer noprint option. noprint option conflicts with listequalar, how do you solve it?
I attached the capture of Summ dataset here, please tell me whether it is the same as yours. Thank you very much
You can use ODS commands to suppress the output.
ods exclude all;
ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar
outnoequal out=dif
;
run;
ods exclude none;
Not sure why you posted a WORD document for TEXT.
Obs type batch 1 h The COMPARE Procedure 2 h Comparison of WORK.CLASS with SASHELP.CLASS 3 h (Method=EXACT) 4 h 5 h Observation Summary 6 h 7 h Observation Base Compare 8 d 9 d First Obs 1 1 10 d First Unequal 1 1 11 d Last Unequal 3 3 12 d Last Obs 19 19 13 d 14 d Number of Observations in Common: 19. 15 d Total Number of Observations Read from WORK.CLASS: 19. 16 d Total Number of Observations Read from SASHELP.CLASS: 19. 17 d 18 d Number of Observations with Some Compared Variables Unequal: 3. 19 d Number of Observations with All Compared Variables Equal: 16. 20 d 21 d 22 h Values Comparison Summary 23 h 24 d Number of Variables Compared with All Observations Equal: 2. 25 d Number of Variables Compared with Some Observations Unequal: 3. 26 d Number of Variables with Missing Value Differences: 1. 27 d Total Number of Values which Compare Unequal: 3. 28 d Maximum Difference: 12. 29 d 30 d 31 h Variables with All Equal Values 32 h 33 h Variable Type Len 34 d 35 d Sex CHAR 1 36 d Height NUM 8 37 d 38 h Variables with Unequal Values 39 h 40 h Variable Type Len Ndif MaxDif MissDif 41 d 42 d Name CHAR 8 1 0 43 d Age NUM 8 1 12.000 0 44 d Weight NUM 8 1 0 1 45 d
Check the setting of the CENTER/NOCENTER option. Perhaps that is what is confusing the data step that is trying to parse the ODS output dataset.
Try:
found = (left(batch)=:'Variable ');
It is the problem which the Vars is empty. Got it and appreciate your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.