08-06-2017 06:11 PM
Hi folks, I have a manager who wants to see all the differences in a comparison between two datasets in a separate dataset.
At the moment I'm using proc compare but the maximum number of differences is 32000, which may not accomodate all the differences.
If I use the 'outnoequal' option it only lists the difference as 'X' for character fields or as the difference for numerical fields, this is not what the manager wants as he needs to see the actual values.
Any suggestions on how to achieve this? I've about 50 datasets I need to compare.
*sample data to check; data class; set sashelp.class; if age=12 then weight=50; if age=14 then sex="T"; run; *proc compare test; proc compare data=class compare=sashelp.class out=check outnoequal; id name; run;
What I would like to see in the difference dataset is something like this:
If I use the outnoequal option I only see this:
Any help is great appreciated.
08-06-2017 07:33 PM
When using outnoequals the X is used to show where there is a difference between character variables, an E where there is no difference in numeric variables and the difference otherwise.
I think the only way you're going to do this is to split your files up by id value and run individual Proc Compares against the susbsets although to be honest I'd try to push back against the requirement - is he really going to look at each and every one of those 32000+ differences?
08-06-2017 07:57 PM
Hi @ChrisBrooks, I've already tried to push back on the requirement without much luck. Do I think he will look all the differences, most likely he won't.
Which files are suggesting I slipt out?
Also tried running proc compare with ods output comparedifferences=check2; but still I get that 32k limitation.
Thanks for your taking your time to look at this respond, greatly appreciated.
08-06-2017 10:07 PM
Is your manager going to read 32,000 differences?
Maybe a better report is needed. What is the actual goal?
08-06-2017 11:25 PM
@ChrisNZ there is a project where a team is migrating the data we use from on-premise to the cloud. So we need to compare the dataset generated on-premise to the one generated in the cloud for any differences.
We may have a run-time column differences which is pretty obvious once you see the first few rows but he wants to see the whole thing.
Out of interest is there a neat way of outputting those 32000 rows to a dataset?
08-07-2017 12:15 AM
Good luck reading so many values.
Anyway, this may get you started:
data CLASS2; set SASHELP.CLASS; if ranuni(0) > .9 then AGE=AGE*1.5; if ranuni(0) > .9 then WEIGHT=WEIGHT*1.5; if ranuni(0) > .9 then HEIGHT=HEIGHT*1.5; if ranuni(0) > .9 then NAME=catt(NAME,_N_); if ranuni(0) > .9 then SEX=catt(SEX,_N_); run; proc contents data=SASHELP.CLASS out=CONTENTS noprint; proc sql; select NAME into :names separated by ' ' from CONTENTS; quit; %macro all_differences; data COMP; merge SASHELP.CLASS(rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB1_VAR&i. %end; )) CLASS2 (rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB2_VAR&i. %end; )); OBSNB=_N_; %do i=1 %to &sqlobs.; %scan(&names.,&i.)=ifc( TAB1_VAR&i. = TAB2_VAR&i. , catt('Same value:', TAB1_VAR&i.) , catt(TAB1_VAR&i., '~', TAB2_VAR&i. )); KEEP=max(KEEP,index(%scan(&names,&i.),'~')); %end; if KEEP; run; %mend; %all_differences; proc print noobs; var OBSNB &names.; run;
|1||Same value:14||69~103.5||Same value:Alfred||Same value:M||112.5~168.75|
|3||Same value:13||Same value:65.3||Barbara~Barbara3||Same value:F||Same value:98|
|5||Same value:14||Same value:63.5||Same value:Henry||Same value:M||102.5~153.75|
|11||Same value:11||Same value:51.3||Joyce~Joyce11||Same value:F||Same value:50.5|
|12||Same value:14||64.3~96.45||Same value:Judy||Same value:F||Same value:90|
|16||12~18||Same value:64.8||Same value:Robert||Same value:M||Same value:128|
|18||Same value:11||57.5~86.25||Same value:Thomas||Same value:M||Same value:85|
08-07-2017 08:54 AM
The following SQL could give you a start .
data one; set sashelp.class; if age=12 then weight=50; if age=14 then sex="T"; run; data two; set sashelp.class; run; proc sql; create table in_one_not_in_two as select * from one except select * from two ; create table in_two_not_in_one as select * from two except select * from one; quit;
08-08-2017 09:30 AM
Would something like this work?