BookmarkSubscribeRSS Feed
metadata
Calcite | Level 5

Hi folks, I have a manager who wants to see all the differences in a comparison between two datasets in a separate dataset.

 

At the moment I'm using proc compare but the maximum number of differences is 32000, which may not accomodate all the differences.

 

If I use the 'outnoequal' option it only lists the difference as 'X' for character fields or as the difference for numerical fields, this is not what the manager wants as he needs to see the actual values.

 

Any suggestions on how to achieve this? I've about 50 datasets I need to compare.

 

 

*sample data to check;
data class;
set sashelp.class;

if age=12 then weight=50;
if age=14 then sex="T";
run;

*proc compare test;
proc compare data=class compare=sashelp.class out=check outnoequal;
id name;
run;

What I would like to see in the difference dataset is something like this:

Capture.PNG 

 

If I use the outnoequal option I only see this:

 

Capture1.PNG

Any help is great appreciated.

13 REPLIES 13
ChrisBrooks
Ammonite | Level 13

When using outnoequals the X is used to show where there is a difference between character variables, an E where there is no difference in numeric variables and the difference otherwise.

 

I think the only way you're going to do this is to split your files up by id value and run individual Proc Compares against the susbsets although to be honest I'd try to push back against the requirement - is he really going to look at each and every one of those 32000+ differences? 

metadata
Calcite | Level 5

Hi @ChrisBrooks, I've already tried to push back on the requirement without much luck. Do I think he will look all the differences, most likely he won't.

 

Which files are suggesting I slipt out?

 

Also tried running proc compare with ods output comparedifferences=check2; but still I get that 32k limitation.

 

Thanks for your taking your time to look at this respond, greatly appreciated.

ChrisNZ
Tourmaline | Level 20

Is your manager going to read 32,000 differences?

Maybe a better report is needed. What is the actual goal?

metadata
Calcite | Level 5

@ChrisNZ there is a project where a team is migrating the data we use from on-premise to the cloud. So we need to compare the dataset generated on-premise to the one generated in the cloud for any differences.

 

We may have a run-time column differences which is pretty obvious once you see the first few rows but he wants to see the whole thing. 

 

Out of interest is there a neat way of outputting those 32000 rows to a dataset?

 

 

ChrisNZ
Tourmaline | Level 20

Good luck reading so many values.

Anyway, this may get you started:

data CLASS2;
  set SASHELP.CLASS;
  if ranuni(0) > .9 then AGE=AGE*1.5; 
  if ranuni(0) > .9 then WEIGHT=WEIGHT*1.5; 
  if ranuni(0) > .9 then HEIGHT=HEIGHT*1.5; 
  if ranuni(0) > .9 then NAME=catt(NAME,_N_);
  if ranuni(0) > .9 then SEX=catt(SEX,_N_);
run;

proc contents data=SASHELP.CLASS out=CONTENTS noprint;
proc sql; select NAME into :names separated by ' ' from CONTENTS; quit;

%macro all_differences;
data COMP;
  merge SASHELP.CLASS(rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB1_VAR&i. %end; ))
        CLASS2       (rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB2_VAR&i. %end; ));
  OBSNB=_N_;
  %do i=1 %to &sqlobs.; 
    %scan(&names.,&i.)=ifc( TAB1_VAR&i. = TAB2_VAR&i.
                          , catt('Same value:', TAB1_VAR&i.)
                          , catt(TAB1_VAR&i., '~', TAB2_VAR&i. ));
    KEEP=max(KEEP,index(%scan(&names,&i.),'~'));
  %end;
  if KEEP;
run;
%mend;
%all_differences;
proc print noobs; 
  var OBSNB &names.;
run;

 

OBSNB Age Height Name Sex Weight
1 Same value:14 69~103.5 Same value:Alfred Same value:M 112.5~168.75
3 Same value:13 Same value:65.3 Barbara~Barbara3 Same value:F Same value:98
5 Same value:14 Same value:63.5 Same value:Henry Same value:M 102.5~153.75
11 Same value:11 Same value:51.3 Joyce~Joyce11 Same value:F Same value:50.5
12 Same value:14 64.3~96.45 Same value:Judy Same value:F Same value:90
16 12~18 Same value:64.8 Same value:Robert Same value:M Same value:128
18 Same value:11 57.5~86.25 Same value:Thomas Same value:M Same value:85
Ksharp
Super User

The following SQL could give you a start .

data one;
set sashelp.class;
if age=12 then weight=50;
if age=14 then sex="T";
run;
data two;
 set sashelp.class;
run;

proc sql;
create table in_one_not_in_two as
 select * from one
 except
 select * from two ;

create table in_two_not_in_one as
 select * from two
 except
 select * from one;
quit;
metadata
Calcite | Level 5

Thanks @ChrisNZ and @Ksharp, I'll play around with these and report back.

Mjoli
Calcite | Level 5

i know this but i would like to compare each and every variable from both tables and conclude whether they are the same or changed

Ksharp
Super User

Then try PROC COMPARE .

JohnSAScom
Quartz | Level 8

Hi @metadata,

 

Would something like this work?

 

data local;
        row_id = 1;
        x=1;y=2;z=3;a=4;
        output;
 
        row_id = 2;
        x=6;y=7;z=1;a=7;
        output;
run;
 
data cloud;
        row_id = 1;
        x=1;y=2;z=3;a=5;
        output;
 
        row_id = 2;
        x=6;y=7;z=1;a=7;
        output;
run;
 
proc sql;
        create table diff as
        select
                f1.x, f2.x as x_cloud,
                f1.y, f2.y as y_cloud,
                f1.z, f2.z as z_cloud,
                f1.a, f2.a as a_cloud
        from local f1
        inner join cloud f2
        on f1.row_id = f2.row_id
        where
                f1.x <> f2.x or
                f1.y <> f2.y or
                f1.z <> f2.z or
                f1.a <> f2.a ;
quit;
sahoositaram555
Pyrite | Level 9
Hi @JohnSAScom,
Thanks for introducing such a wonderful method via proc sql. Could you please let us know what exactly does this <> mean ?
ChrisNZ
Tourmaline | Level 20

> Could you please let us know what exactly does this <> mean ? 

You haven't read the log after running the process have you?

Always read the log.

Even if there are no errors.

Always.

 

sahoositaram555
Pyrite | Level 9
Thanks @ChrisNZ for your kind advice. I've got my answer .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 14865 views
  • 2 likes
  • 7 in conversation