Hi Experts,
I have two similar datasets which has most of the values as same but few of the values in some columns are different. I want to compare both of them and have only those rows which have those values as different. I have imported both the datasets and sorted it. Then I have merged the dataset based on the common variable i.e., DM9_Consumer_ID. I tried Proc Compare but it does the overall comparison and does not give the DM9_Consumer_IDs which have inconsistency. Can you please suggest what is the best way to find the DM9_Consumer_ID which have data difference? Here are both the sample datasets:
Proc sort data=existing;
by DM9_Consumer_ID;
run;
Proc sort data=revised;
by DM9_Consumer_ID;
run;
Data merged;
merge existing (in=a) revised (in=b);
by DM9_Consumer_ID;
if a and b;
run;
/*Compare the difference*/
proc compare
base = existing
compare = revised;
run;
Use
PROC COMPARE statement options:
NOPRINT
OUT=
OUTBASE
OUTCOMP
OUTDIF
OUTNOEQUAL
OUTSTATS= ;
Koen
Use the BY DM_Consumer_ID option in Proc Compare as well.
If you are particularly interested in just a few particular variables use a VAR statement Proc Compare to reduce unwanted output.
Data step merge is going to be completely useless of you want to find different values of the variables other than BY variables because of the merge treats like-named variables.
Start with
proc compare base = existing compare = revised
OUTNOEQUAL OUT=WORK.NOEQUAL
; by DM9_Consumer_ID; var Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment; run;
The output data set WORK.NOEQUAL will have the value of DM9_consumer_id and the Var variables with the difference of values or the special missing value that displays as E for equal values.
The key part is that Work.Noequal has the By values that have at least one non-equal value in the VAR variables.
If that is sufficient for your "inconsistent" then add a KEEP=DM9_consumer_id data set option to Work.Noequal.
Data Existing;
infile cards expandtabs;
input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;
datalines ;
D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98
D0482 38926.17 24643.95 20899.13 20899.13 38926.17 41600.88
D0645 16856.41 499.50 16491.03 16491.03 16856.41 31175.61
D4146 45732.36 -103457.96 45732.36 45732.36 45732.36 86455.14
D9007 7045.11 -15894.38 7045.11 7045.11 7045.11 13318.49
D9012 24650.07 -28707.57 24650.07 24650.07 24650.07 46599.93
D3024 8495.44 -24975.15 8495.44 8495.44 8495.44 16910.65
D3063 7098.62 302.89 6795.73 6795.73 7098.62 12847.05
;
run;
Data Revised;
infile cards expandtabs;
input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;
datalines ;
D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98
D0482 24643.95 0.00 0.00 18027.05 41600.88 18027.05
D0645 499.50 0.00 0.00 365.38 31175.61 365.38
D4146 -103457.96 45732.36 45732.36 45732.36 86455.14 45732.36
D9007 -15894.38 7045.11 7045.11 7045.11 13318.49 7045.11
D9012 -28707.57 24650.07 24650.07 24650.07 46599.93 24650.07
D3024 -24975.15 8495.44 8495.44 8495.44 16910.65 8495.44
D3063 302.89 6795.73 6795.73 7098.62 12847.05 7098.62
;
run;
%let old_data= c:\temp\a ; *前一次数据 的路径;
%let new_data= c:\temp\b ; *新数据 的路径;
libname old v9 "&old_data." ;
data old.have;
set Existing;
run;
libname new v9 "&new_data." ;
data new.have;
set Revised;
run;
%let subjid= DM9_Consumer_ID; *受试者编号的变量名. 注意:所有的数据集中是一样的,且值是唯一的;
%let drop= UPDATE_TIME ; *需要去掉的变量名;
options validvarname=any validmemname=extend mrecall nofmterr nonumber nodate dkricond=nowarn dkrocond=nowarn;
/*******比较 前一次数据 和 新数据********/
libname old v9 "&old_data." access=readonly;
libname new v9 "&new_data." access=readonly;
%macro type_one(idx=,dsn=,title=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&dsn.(drop=&drop.) out=old;by &subjid. ;run;
proc sort data=new.&dsn.(drop=&drop.) out=new;by &subjid. ;run;
proc compare data=old compare=new noprint out=diff_&dsn.;
id &subjid.;
run;
data call_define;
set diff_&dsn.(drop=_TYPE_ _OBS_);
retain _dummy_ .;
array n{*} _numeric_;
array c{*} _character_;
length vname $ 40;
do i=1 to dim(n);
if n{i} not in (. 0) then do;vname=vname(n{i}); if upcase(vname) not in ("%upcase(&subjid.)") then output;end;
end;
do i=1 to dim(c);
if findc(c{i},'X') then do;vname=vname(c{i}); if upcase(vname) not in ("%upcase(&subjid.)") then output;end;
end;
keep &subjid. vname;
run;
filename t temp;
data _null_;
set call_define;
file t;
/* put 'if &subjid.="' &subjid. '" then call define("' vname '","style","style={background=yellow}");';*/
put "if &subjid.='" &subjid. "' then call define('" vname "','style','style={background=yellow}');";
run;
data &dsn.;
if _n_=1 then do;
length new_change $ 40;
if 0 then set old.&dsn.;
declare hash h(dataset:"old.&dsn.");
h.definekey("&subjid.");
h.definedone();
declare hash h1(dataset:'call_define');
h1.definekey("&subjid.");
h1.definedone();
end;
set new.&dsn.;
new_change='old' ;
if h.check() ne 0 then new_change='new' ;
if h1.check() = 0 then new_change='change' ;
label new_change='数据状态';
run;
/*ods excel file="&out.\&idx..&dsn..xlsx" options(sheet_name="&dsn." autofilter='all' ) ;*/
proc report data=&dsn. split='*' nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
endcomp;
run;
/*ods excel close;*/
%mend;
%type_one(idx=1,dsn=have,title=标题)
You could see I compare two SAS TABLES, not EXCEL file, I think you should import these excel file into sas firstly.
And %let old_data= is a path/directory/folder , NOT a file.
So is "%let new_data= ".
Here is an example:
%let old_data= c:\Users\56029\Desktop\Compare1\ ;
%let new_data= c:\Users\56029\Desktop\Compare2\ ;
Create two folders "Compare1\" and "Compare2\" by hand, if you do not have under your OS.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.