data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;
run;
data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;
run;
I am attempting to create a dataset where the contents of both tables DO NOT EQUAL
So in this case I want the first two rows in both are the same so i want to eliminate them. I then want to create a dataset of the unique values
Can this be done in a proc sql or datastep
It's not clear why the variables have different names in the two data sets, but let's take that as a given. Then:
proc sort data=a;
by my1 my2 clm drg tin;
run;
proc sort data=b;
by cust1 cust2 clm drg tin;
run;
data unique;
set a (in=ina rename=(my1=cust1 my2=cust2))
b;
by cust1 cust2 clm drg tin;
if first.tin and last.tin;
if in1 then source='Dataset A';
else source3='Dataset B';
run;
Hi,
proc sql;
select * from
(select * from a
except
select * from b)
union
select * from
(select * from b
except
select * from a)
;
quit;
Also,try PROC COMPARE which gives more information that might help you.
proc compare data=a compare=b;
run;
data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;
run;
data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;
proc sql;
create table want as
(select * from a
except all
select * from b)
union all
(select * from b
except all
select * from a);
quit;
Oh well @Q1983 I failed to notice the similar approach by @SuryaKiran that was posted earlier. Ignore mine! Thanks
@novinosrin Although it's almost similar you have covered one extra point by mentioning ALL, which will also include duplicate records if their are any.
Yes thank you, but you did post it and I missed to see it which I generally don't. Anyways, indeed probably the best approach by any means right from convenience of coding, to maintenance and change requests by anybody regardless of skills level in my opinion. Well done! & cheers!
data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;
run;
data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;
data _null_;
if _n_=1 then do;
if 0 then set a;
dcl hash H (dataset:'a',multidata:'y',ordered:'y') ;
h.definekey ('my1','my2','clm','drg','tin') ;
h.definedone () ;
end;
set b end=l;
rc= h.check(key:cust1,key:cust2,key:clm,key:drg,key:tin) ;
if rc=0 then rc1=h.remove(key:cust1,key:cust2,key:clm,key:drg,key:tin);
else h.add(key:cust1,key:cust2,key:clm,key:drg,key:tin,data:cust1,data:cust2,data:clm,data:drg,data:tin);
if l then h.output(dataset:'want');
run;
Here's a single data step solution that doesn't require sorting.
data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
run;
data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
run;
data want1 want2;
set a b;
if _n_=1 then do;
declare hash h (dataset:'a (obs=0))';
h.definekey(all:'Y');
h.definedata(all:'Y');
h.definedone();
end;
if h.check^=0 then do;
output want1;
h.add();
end;
else output want2;
run;
The strategy here is
read a record
if this is first time for these values (i.e. not in the hash object) then output want1 and put it in the hash object for later checking
but if it's already in the hash then output want2.
data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;
run;
data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;
proc sql;
create table want(drop=t) as
select *
from (select *, 1 as t from a
union all
select *, 2 as t from b)
group by my1,my2,clm,drg,tin
having count(distinct t)=1;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.