Hi folks, I have a manager who wants to see all the differences in a comparison between two datasets in a separate dataset.
At the moment I'm using proc compare but the maximum number of differences is 32000, which may not accomodate all the differences.
If I use the 'outnoequal' option it only lists the difference as 'X' for character fields or as the difference for numerical fields, this is not what the manager wants as he needs to see the actual values.
Any suggestions on how to achieve this? I've about 50 datasets I need to compare.
*sample data to check;
data class;
set sashelp.class;
if age=12 then weight=50;
if age=14 then sex="T";
run;
*proc compare test;
proc compare data=class compare=sashelp.class out=check outnoequal;
id name;
run;
What I would like to see in the difference dataset is something like this:
If I use the outnoequal option I only see this:
Any help is great appreciated.
When using outnoequals the X is used to show where there is a difference between character variables, an E where there is no difference in numeric variables and the difference otherwise.
I think the only way you're going to do this is to split your files up by id value and run individual Proc Compares against the susbsets although to be honest I'd try to push back against the requirement - is he really going to look at each and every one of those 32000+ differences?
Hi @ChrisBrooks, I've already tried to push back on the requirement without much luck. Do I think he will look all the differences, most likely he won't.
Which files are suggesting I slipt out?
Also tried running proc compare with ods output comparedifferences=check2; but still I get that 32k limitation.
Thanks for your taking your time to look at this respond, greatly appreciated.
Is your manager going to read 32,000 differences?
Maybe a better report is needed. What is the actual goal?
@ChrisNZ there is a project where a team is migrating the data we use from on-premise to the cloud. So we need to compare the dataset generated on-premise to the one generated in the cloud for any differences.
We may have a run-time column differences which is pretty obvious once you see the first few rows but he wants to see the whole thing.
Out of interest is there a neat way of outputting those 32000 rows to a dataset?
Good luck reading so many values.
Anyway, this may get you started:
data CLASS2;
set SASHELP.CLASS;
if ranuni(0) > .9 then AGE=AGE*1.5;
if ranuni(0) > .9 then WEIGHT=WEIGHT*1.5;
if ranuni(0) > .9 then HEIGHT=HEIGHT*1.5;
if ranuni(0) > .9 then NAME=catt(NAME,_N_);
if ranuni(0) > .9 then SEX=catt(SEX,_N_);
run;
proc contents data=SASHELP.CLASS out=CONTENTS noprint;
proc sql; select NAME into :names separated by ' ' from CONTENTS; quit;
%macro all_differences;
data COMP;
merge SASHELP.CLASS(rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB1_VAR&i. %end; ))
CLASS2 (rename=(%do i=1 %to &sqlobs.; %scan(&names.,&i)=TAB2_VAR&i. %end; ));
OBSNB=_N_;
%do i=1 %to &sqlobs.;
%scan(&names.,&i.)=ifc( TAB1_VAR&i. = TAB2_VAR&i.
, catt('Same value:', TAB1_VAR&i.)
, catt(TAB1_VAR&i., '~', TAB2_VAR&i. ));
KEEP=max(KEEP,index(%scan(&names,&i.),'~'));
%end;
if KEEP;
run;
%mend;
%all_differences;
proc print noobs;
var OBSNB &names.;
run;
OBSNB | Age | Height | Name | Sex | Weight |
---|---|---|---|---|---|
1 | Same value:14 | 69~103.5 | Same value:Alfred | Same value:M | 112.5~168.75 |
3 | Same value:13 | Same value:65.3 | Barbara~Barbara3 | Same value:F | Same value:98 |
5 | Same value:14 | Same value:63.5 | Same value:Henry | Same value:M | 102.5~153.75 |
11 | Same value:11 | Same value:51.3 | Joyce~Joyce11 | Same value:F | Same value:50.5 |
12 | Same value:14 | 64.3~96.45 | Same value:Judy | Same value:F | Same value:90 |
16 | 12~18 | Same value:64.8 | Same value:Robert | Same value:M | Same value:128 |
18 | Same value:11 | 57.5~86.25 | Same value:Thomas | Same value:M | Same value:85 |
The following SQL could give you a start .
data one;
set sashelp.class;
if age=12 then weight=50;
if age=14 then sex="T";
run;
data two;
set sashelp.class;
run;
proc sql;
create table in_one_not_in_two as
select * from one
except
select * from two ;
create table in_two_not_in_one as
select * from two
except
select * from one;
quit;
i know this but i would like to compare each and every variable from both tables and conclude whether they are the same or changed
Then try PROC COMPARE .
Hi @metadata,
Would something like this work?
> Could you please let us know what exactly does this <> mean ?
You haven't read the log after running the process have you?
Always read the log.
Even if there are no errors.
Always.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.