BookmarkSubscribeRSS Feed
CatPaws
Calcite | Level 5

I have two data sets with the same variables, but different observations. I need to know if any observations in data set 1 are in data set 2. How do I do this? Do I merge them first?

5 REPLIES 5
Astounding
PROC Star

A few details would be helpful.

 

Could data set 1 contain two identical observations?  How would you like to handle that?

 

Do you need to identify observations that are 100% identical, or just largely identical?

 

CatPaws
Calcite | Level 5
Data set 1 would not have identical observations within the dataset. I need to identify obervations that are 100% identical between data sets. For example, I need to know if there is an oberservation in data set 1 that is also in data set 2 or vice versa.
LinusH
Tourmaline | Level 20

PROC COMPARE is one option.

Another option is to put your full observation in one varible - and convert it to a hash, using MD5 or SHA.

Based on that you can use either data step merge or SQL inner join.

Data never sleeps
Ksharp
Super User
data have1;
 set sashelp.class;
run;

data have2;
 set sashelp.class end=last;
 output;
 if last then do;name='xxxx';output;end;
run;


proc sql;
create table obs_in_both as 
select * from have1
intersect
select * from have2
;
quit;
mkeintz
PROC Star

This will output all observations in B that match any observation in A, which satisfies your criterion as long as neither dataset has duplicates, and A and B have the same variables. 

 

data a b;
  set sashelp.class;
  if mod(_n_,3)=0 then output a b;
  else if mod(_n_,3)=1 then output a;
  else output b;
run;

data both;
  set b;
  if _n_=1 then do;
    declare hash ha (dataset:'a');
      ha.definekey(all:'Y');
      ha.definedone();
  end;
  if ha.find()=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 675 views
  • 0 likes
  • 5 in conversation