DATA Step, Macro, Functions and more

Stacking two datasets with overlapping IDs

Reply
Frequent Contributor
Posts: 138

Stacking two datasets with overlapping IDs

Hi,

 

I have two datasets which I would like to stack. They have the same variables. The tricky part is that I only want to keep people from Dataset 2 if they don't already have an entry in Dataset 1; that is, I want to add to Dataset 1 those people from Dataset 2 that are unique and not already in the dataset only. 

 

I know how to do this in two steps (merge the two such that only the people who are in Dataset 2 but not Dataset 1 are left, then stack that with Dataset 1 using a set statement), but I wanted to know if there's a way to do it in one step.

 

Any help is much appreciated. 

Contributor
Posts: 38

Re: Stacking two datasets with overlapping IDs

[ Edited ]
Posted in reply to Walternate
    data want;
    merge  have1(in=a)
           have2(in=b);
    by key;
    if (b=1 and a=0) /*keep people from 2 if not in 1*/
    or (b=0 and a=1) /*keep people that 1 if not in 2*/
    ;
    run;
Respected Advisor
Posts: 3,799

Re: Stacking two datasets with overlapping IDs

[ Edited ]
Posted in reply to Walternate

This may do what you want.  Without more details it is hard to say.  Assumes one record per ID.

 

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;
data c;
   set a b;
   by id;
   if first.id;
   run;
proc print;
   run;

 

 

 Capture.PNG

Super User
Posts: 19,833

Re: Stacking two datasets with overlapping IDs

Posted in reply to Walternate

A SQL solution that's probably not more efficient Smiley Happy and after borrowing @data_null__'s fake data.

 

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;

proc sql;
create table want as
select * from(
select * from a
union
select * from b
where id not in (select distinct Id from a)) a
order by id;
quit;
Respected Advisor
Posts: 4,927

Re: Stacking two datasets with overlapping IDs

Posted in reply to Walternate

What is missing in your request is what to do with people that are present in both datasets but with different information. If you want to keep both records, try this approach:

 

data a;
input id x$;
datalines;
1 a
3 b
4 c
6 d
7 e
;
data b;
input id x$;
datalines;
1 a
2 b
3 c
5 d
6 e
7 f
;

proc sql;
create table c as
select *
from
    (select * from a)
    union 
    (select * from b);
select * from c;
quit;
PG
Super User
Posts: 5,513

Re: Stacking two datasets with overlapping IDs

Posted in reply to Walternate

As others have noted, it makes a big difference knowing how many observations might be in the data sets for each person.  If there is only one observation for each person, the program is actually as simple as this:

 

data want;

merge dataset2 dataset1;

by key;

run;

 

Any KEY values that appear in both data sets will have the DATASET1 values in them after a MERGE.  Of course, this assumes you are working with sorted data sets.

 

If there could be more than one observation per person in either data set, we need to know that.  And we also need to know more details about what you want the outcome to be.

Ask a Question
Discussion stats
  • 5 replies
  • 309 views
  • 0 likes
  • 6 in conversation