I have two data sets with the same variables, but different observations. I need to know if any observations in data set 1 are in data set 2. How do I do this? Do I merge them first?
A few details would be helpful.
Could data set 1 contain two identical observations? How would you like to handle that?
Do you need to identify observations that are 100% identical, or just largely identical?
PROC COMPARE is one option.
Another option is to put your full observation in one varible - and convert it to a hash, using MD5 or SHA.
Based on that you can use either data step merge or SQL inner join.
data have1; set sashelp.class; run; data have2; set sashelp.class end=last; output; if last then do;name='xxxx';output;end; run; proc sql; create table obs_in_both as select * from have1 intersect select * from have2 ; quit;
This will output all observations in B that match any observation in A, which satisfies your criterion as long as neither dataset has duplicates, and A and B have the same variables.
data a b;
set sashelp.class;
if mod(_n_,3)=0 then output a b;
else if mod(_n_,3)=1 then output a;
else output b;
run;
data both;
set b;
if _n_=1 then do;
declare hash ha (dataset:'a');
ha.definekey(all:'Y');
ha.definedone();
end;
if ha.find()=0;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.