DATA Step, Macro, Functions and more

List all differences between two datasets in another dataset

Reply
New Contributor
Posts: 4

List all differences between two datasets in another dataset

Hi folks, I have a manager who wants to see all the differences in a comparison between two datasets in a separate dataset.

 

At the moment I'm using proc compare but the maximum number of differences is 32000, which may not accomodate all the differences.

 

If I use the 'outnoequal' option it only lists the difference as 'X' for character fields or as the difference for numerical fields, this is not what the manager wants as he needs to see the actual values.

 

Any suggestions on how to achieve this? I've about 50 datasets I need to compare.

 

 

*sample data to check;
data class;
set sashelp.class;

if age=12 then weight=50;
if age=14 then sex="T";
run;

*proc compare test;
proc compare data=class compare=sashelp.class out=check outnoequal;
id name;
run;

What I would like to see in the difference dataset is something like this:

Capture.PNG 

 

If I use the outnoequal option I only see this:

 

Capture1.PNG

Any help is great appreciated.

Regular Contributor
Posts: 190

Re: List all differences between two datasets in another dataset

When using outnoequals the X is used to show where there is a difference between character variables, an E where there is no difference in numeric variables and the difference otherwise.

 

I think the only way you're going to do this is to split your files up by id value and run individual Proc Compares against the susbsets although to be honest I'd try to push back against the requirement - is he really going to look at each and every one of those 32000+ differences? 

New Contributor
Posts: 4

Re: List all differences between two datasets in another dataset

Hi @ChrisBrooks, I've already tried to push back on the requirement without much luck. Do I think he will look all the differences, most likely he won't.

 

Which files are suggesting I slipt out?

 

Also tried running proc compare with ods output comparedifferences=check2; but still I get that 32k limitation.

 

Thanks for your taking your time to look at this respond, greatly appreciated.

PROC Star
Posts: 1,570

Re: List all differences between two datasets in another dataset

Is your manager going to read 32,000 differences?

Maybe a better report is needed. What is the actual goal?

New Contributor
Posts: 4

Re: List all differences between two datasets in another dataset

@ChrisNZ there is a project where a team is migrating the data we use from on-premise to the cloud. So we need to compare the dataset generated on-premise to the one generated in the cloud for any differences.

 

We may have a run-time column differences which is pretty obvious once you see the first few rows but he wants to see the whole thing. 

 

Out of interest is there a neat way of outputting those 32000 rows to a dataset?

 

 

PROC Star
Posts: 1,570

Re: List all differences between two datasets in another dataset

Good luck reading so many values.

Anyway, this may get you started:

data CLASS2;
  set SASHELP.CLASS;
  if ranuni(0) > .9 then AGE=AGE*1.5; 
  if ranuni(0) > .9 then WEIGHT=WEIGHT*1.5; 
  if ranuni(0) > .9 then HEIGHT=HEIGHT*1.5; 
  if ranuni(0) > .9 then NAME=catt(NAME,_N_);
  if ranuni(0) > .9 then SEX=catt(SEX,_N_);
run;

proc contents data=SASHELP.CLASS out=CONTENTS noprint;
proc sql; select NAME into :names separated by ' ' from CONTENTS; quit;

%macro all_differences;
data COMP;
  merge SASHELP.CLASS(rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB1_VAR&i. %end; ))
        CLASS2       (rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB2_VAR&i. %end; ));
  OBSNB=_N_;
  %do i=1 %to &sqlobs.; 
    %scan(&names.,&i.)=ifc( TAB1_VAR&i. = TAB2_VAR&i.
                          , catt('Same value:', TAB1_VAR&i.)
                          , catt(TAB1_VAR&i., '~', TAB2_VAR&i. ));
    KEEP=max(KEEP,index(%scan(&names,&i.),'~'));
  %end;
  if KEEP;
run;
%mend;
%all_differences;
proc print noobs; 
  var OBSNB &names.;
run;

 

OBSNB Age Height Name Sex Weight
1 Same value:14 69~103.5 Same value:Alfred Same value:M 112.5~168.75
3 Same value:13 Same value:65.3 Barbara~Barbara3 Same value:F Same value:98
5 Same value:14 Same value:63.5 Same value:Henry Same value:M 102.5~153.75
11 Same value:11 Same value:51.3 Joyce~Joyce11 Same value:F Same value:50.5
12 Same value:14 64.3~96.45 Same value:Judy Same value:F Same value:90
16 12~18 Same value:64.8 Same value:Robert Same value:M Same value:128
18 Same value:11 57.5~86.25 Same value:Thomas Same value:M Same value:85
Super User
Posts: 9,691

Re: List all differences between two datasets in another dataset

The following SQL could give you a start .

data one;
set sashelp.class;
if age=12 then weight=50;
if age=14 then sex="T";
run;
data two;
 set sashelp.class;
run;

proc sql;
create table in_one_not_in_two as
 select * from one
 except
 select * from two ;

create table in_two_not_in_one as
 select * from two
 except
 select * from one;
quit;
New Contributor
Posts: 4

Re: List all differences between two datasets in another dataset

Thanks @ChrisNZ and @Ksharp, I'll play around with these and report back.

Contributor
Posts: 27

Re: List all differences between two datasets in another dataset

Hi @metadata,

 

Would something like this work?

 

data local;
        row_id = 1;
        x=1;y=2;z=3;a=4;
        output;
 
        row_id = 2;
        x=6;y=7;z=1;a=7;
        output;
run;
 
data cloud;
        row_id = 1;
        x=1;y=2;z=3;a=5;
        output;
 
        row_id = 2;
        x=6;y=7;z=1;a=7;
        output;
run;
 
proc sql;
        create table diff as
        select
                f1.x, f2.x as x_cloud,
                f1.y, f2.y as y_cloud,
                f1.z, f2.z as z_cloud,
                f1.a, f2.a as a_cloud
        from local f1
        inner join cloud f2
        on f1.row_id = f2.row_id
        where
                f1.x <> f2.x or
                f1.y <> f2.y or
                f1.z <> f2.z or
                f1.a <> f2.a ;
quit;
Ask a Question
Discussion stats
  • 8 replies
  • 254 views
  • 0 likes
  • 5 in conversation