BookmarkSubscribeRSS Feed
Q1983
Lapis Lazuli | Level 10

data a;

input my1:$2. my2:$2. clm drg tin;

datalines;

 

aa bb 1 2 3

bb cc 2 3 4

dd ee 3 4 5

ff gg 4 5 6

hh ii 6 7 8

;

run;

data b;

input cust1:$2. cust2:$2. clm drg tin;

datalines;

 

aa bb 1 2 3

bb cc 2 3 4

zz xx 3 4 5

yy ww 4 5 6

hh ii 2 7 8

;

run;

I am attempting to create a dataset where the contents of both tables DO NOT EQUAL

 

So in this case I want the first two rows in both are the same so i want to eliminate them.  I then want to create a dataset of the unique values

Can this be done in a proc sql or datastep

 

10 REPLIES 10
Reeza
Super User
Can you please show what you'd like as an output? Is it ok to rename the first data set. Is this a one time thing, or something that needs to scale to other data sets? Or will be run multiple times?
Astounding
PROC Star

It's not clear why the variables have different names in the two data sets, but let's take that as a given.  Then:

 

proc sort data=a;

   by my1 my2 clm drg tin;

run;

 

proc sort data=b;

   by cust1 cust2 clm drg tin;

run;

 

data unique;

set a (in=ina rename=(my1=cust1 my2=cust2))

b;

by cust1 cust2 clm drg tin;

if first.tin and last.tin;

if in1 then source='Dataset A';

else source3='Dataset B';

run;

SuryaKiran
Meteorite | Level 14

Hi,

 

proc sql;
select * from 
(select * from a
	except 
 select * from b)
union 
select * from 
(select * from b
	except 
 select * from a)
;
quit;

image.png

 

Also,try PROC COMPARE which gives more information that might help you.

proc compare data=a compare=b;
run;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

data a;

input my1:$2. my2:$2. clm drg tin;

datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;

run;

data b;

input cust1:$2. cust2:$2. clm drg tin;

datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;

proc sql;
create table want as
(select * from a
except all
select * from b)
union all
(select * from b
except all
select * from a);
quit;
novinosrin
Tourmaline | Level 20

Oh well @Q1983 I failed to notice the similar approach by @SuryaKiran that was posted earlier. Ignore mine! Thanks

SuryaKiran
Meteorite | Level 14

@novinosrin Although it's almost similar you have covered one extra point by mentioning ALL, which will also include duplicate records if their are any.

Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

Yes thank you, but you did post it and I missed to see it which I generally don't. Anyways,  indeed probably the best approach by any means right from convenience of coding, to maintenance and change requests by anybody regardless of skills level in my opinion. Well done! & cheers!

novinosrin
Tourmaline | Level 20
data a;

input my1:$2. my2:$2. clm drg tin;

datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;

run;

data b;

input cust1:$2. cust2:$2. clm drg tin;

datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;

data _null_;
if _n_=1 then do;
if 0 then set a;
   dcl hash H (dataset:'a',multidata:'y',ordered:'y') ;
   h.definekey  ('my1','my2','clm','drg','tin') ;
   h.definedone () ;
end;
set b end=l;
rc= h.check(key:cust1,key:cust2,key:clm,key:drg,key:tin) ;
if rc=0 then rc1=h.remove(key:cust1,key:cust2,key:clm,key:drg,key:tin);
else h.add(key:cust1,key:cust2,key:clm,key:drg,key:tin,data:cust1,data:cust2,data:clm,data:drg,data:tin);
if l then h.output(dataset:'want');
run;
mkeintz
PROC Star

Here's a  single data step solution that doesn't require sorting.

 

data a;
  input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
run;

data b;
  input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
run;

data want1 want2;
  set a b;
  if _n_=1 then do;
    declare hash h (dataset:'a (obs=0))';
      h.definekey(all:'Y');
      h.definedata(all:'Y');
      h.definedone();
  end;
  if h.check^=0 then do;
    output want1;
    h.add();
  end;
  else output want2;
run;

The  strategy here is

  read a record

  if this is first time for these values (i.e. not in the hash object) then output want1 and put it in the hash object for later checking

  but if it's already in the hash then output want2.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data a;
input my1:$2. my2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
dd ee 3 4 5
ff gg 4 5 6
hh ii 6 7 8
;

run;

data b;
input cust1:$2. cust2:$2. clm drg tin;
datalines;
aa bb 1 2 3
bb cc 2 3 4
zz xx 3 4 5
yy ww 4 5 6
hh ii 2 7 8
;

proc sql;
create table want(drop=t) as
select *
from (select *, 1 as t from a
union all
select *, 2 as t from b)
group by my1,my2,clm,drg,tin
having count(distinct t)=1;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3917 views
  • 8 likes
  • 6 in conversation