Hi,
I did a PROC COMPARE between two datasets. However, I would like the results to only show me the highest value differences for each variable and limit them each to about 10. How would I go about doing this?
This is what I currently have.
proc compare base=DATA.Base compare=DATA.Compare out= data.Result out=result outnoequal outbase outcomp outdif ;
title ' Using an Output Dataset';
run;
proc compare lists the differences as it finds them. i.e. in the order of the data set.
What you want is to calculate all the differences and extract the top 10 for each difference.
That's more analysis than proc compare does.
Here is quick macro to do what you want.
The output shows the 10 largest differences for all numeric variables in an unsorted manner.
%macro comparetop10diff(ds1=,ds2=);
proc contents data=&ds1. out=CONT1 noprint;
proc contents data=&ds2. out=CONT2 noprint;
proc sql;
select a.NAME into :varnames separated by ' ' from CONT1 a, CONT2 b where a.NAME=b.NAME and a.TYPE=1;
data TOP10DIFF;
merge &ds1. &ds2.(rename=(
%do i=1 %to %sysfunc(countw(&varnames,%str( )));
%let varname=%scan(&varnames,&i,%str( ));
&varname=COMPARETOP10DIFFVAR&i
%end;
)) end=LASTOBS;
%do i=1 %to %sysfunc(countw(&varnames,%str( )));
%let varname=%scan(&varnames,&i,%str( ));
array VAR&i.DS1 [10] (10*0);
array VAR&i.DS2 [10] (10*0);
array VAR&i.DIF [10] (10*0);
DIFF=abs(&varname.-COMPARETOP10DIFFVAR&i.);
MIN =min(of VAR&i.DIF[*]);
if DIFF > MIN then do;
POS=whichn(MIN, of VAR&i.DIF[*]);
VAR&i.DS1[POS]=&varname.;
VAR&i.DS2[POS]=COMPARETOP10DIFFVAR&i.;
VAR&i.DIF[POS]=DIFF;
end;
%end;
length VARNAME $32;
array VAL[10];
keep VARNAME VAL1-VAL10 SOURCE;
if LASTOBS then do;
%do i=1 %to %sysfunc(countw(&varnames,%str( )));
%let varname=%scan(&varnames,&i,%str( ));
VARNAME="&varname"; SOURCE='DS1'; do I=1 to 10; VAL[I]=VAR&i.DS1[I]; end; output;
VARNAME="&varname"; SOURCE='DS2'; do I=1 to 10; VAL[I]=VAR&i.DS2[I]; end; output;
VARNAME="&varname"; SOURCE='DIF'; do I=1 to 10; VAL[I]=VAR&i.DIF[I]; end; output;
%end;
end;
run;
%mend;
data CLASS; set SASHELP.CLASS; AGE=AGE/2; WEIGHT=WEIGHT+1; HEIGHT=HEIGHT+1; run;
%comparetop10diff(ds1=SASHELP.CLASS,ds2=CLASS);
VARNAME | VAL1 | VAL2 | VAL3 | VAL4 | VAL5 | VAL6 | VAL7 | VAL8 | VAL9 | VAL10 | SOURCE |
---|---|---|---|---|---|---|---|---|---|---|---|
Age | 14.0 | 15.0 | 15.0 | 14.0 | 14.0 | 14.0 | 15.0 | 15.0 | 13.0 | 16.0 | DS1 |
Age | 7.0 | 7.5 | 7.5 | 7.0 | 7.0 | 7.0 | 7.5 | 7.5 | 6.5 | 8.0 | DS2 |
Age | 7.0 | 7.5 | 7.5 | 7.0 | 7.0 | 7.0 | 7.5 | 7.5 | 6.5 | 8.0 | DIF |
Height | 69.0 | 56.5 | 65.3 | 62.8 | 63.5 | 57.3 | 59.8 | 62.5 | 62.5 | 59.0 | DS1 |
Height | 70.0 | 57.5 | 66.3 | 63.8 | 64.5 | 58.3 | 60.8 | 63.5 | 63.5 | 60.0 | DS2 |
Height | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | DIF |
Weight | 112.5 | 84.0 | 98.0 | 102.5 | 102.5 | 83.0 | 84.5 | 112.5 | 84.0 | 99.5 | DS1 |
Weight | 113.5 | 85.0 | 99.0 | 103.5 | 103.5 | 84.0 | 85.5 | 113.5 | 85.0 | 100.5 | DS2 |
Weight | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | DIF |
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.