- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have two data sets with the same variables, but different observations. I need to know if any observations in data set 1 are in data set 2. How do I do this? Do I merge them first?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A few details would be helpful.
Could data set 1 contain two identical observations? How would you like to handle that?
Do you need to identify observations that are 100% identical, or just largely identical?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC COMPARE is one option.
Another option is to put your full observation in one varible - and convert it to a hash, using MD5 or SHA.
Based on that you can use either data step merge or SQL inner join.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have1; set sashelp.class; run; data have2; set sashelp.class end=last; output; if last then do;name='xxxx';output;end; run; proc sql; create table obs_in_both as select * from have1 intersect select * from have2 ; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This will output all observations in B that match any observation in A, which satisfies your criterion as long as neither dataset has duplicates, and A and B have the same variables.
data a b;
set sashelp.class;
if mod(_n_,3)=0 then output a b;
else if mod(_n_,3)=1 then output a;
else output b;
run;
data both;
set b;
if _n_=1 then do;
declare hash ha (dataset:'a');
ha.definekey(all:'Y');
ha.definedone();
end;
if ha.find()=0;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------