BookmarkSubscribeRSS Feed
CatPaws
Calcite | Level 5

I have two data sets with the same variables, but different observations. I need to know if any observations in data set 1 are in data set 2. How do I do this? Do I merge them first?

5 REPLIES 5
Astounding
PROC Star

A few details would be helpful.

 

Could data set 1 contain two identical observations?  How would you like to handle that?

 

Do you need to identify observations that are 100% identical, or just largely identical?

 

CatPaws
Calcite | Level 5
Data set 1 would not have identical observations within the dataset. I need to identify obervations that are 100% identical between data sets. For example, I need to know if there is an oberservation in data set 1 that is also in data set 2 or vice versa.
LinusH
Tourmaline | Level 20

PROC COMPARE is one option.

Another option is to put your full observation in one varible - and convert it to a hash, using MD5 or SHA.

Based on that you can use either data step merge or SQL inner join.

Data never sleeps
Ksharp
Super User
data have1;
 set sashelp.class;
run;

data have2;
 set sashelp.class end=last;
 output;
 if last then do;name='xxxx';output;end;
run;


proc sql;
create table obs_in_both as 
select * from have1
intersect
select * from have2
;
quit;
mkeintz
PROC Star

This will output all observations in B that match any observation in A, which satisfies your criterion as long as neither dataset has duplicates, and A and B have the same variables. 

 

data a b;
  set sashelp.class;
  if mod(_n_,3)=0 then output a b;
  else if mod(_n_,3)=1 then output a;
  else output b;
run;

data both;
  set b;
  if _n_=1 then do;
    declare hash ha (dataset:'a');
      ha.definekey(all:'Y');
      ha.definedone();
  end;
  if ha.find()=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 803 views
  • 0 likes
  • 5 in conversation