topic Re: Stacking two datasets with overlapping IDs in SAS Programming

Stacking two datasets with overlapping IDs

Walternate — Fri, 22 Jan 2016 17:02:23 GMT

Hi,

I have two datasets which I would like to stack. They have the same variables. The tricky part is that I only want to keep people from Dataset 2 if they don't already have an entry in Dataset 1; that is, I want to add to Dataset 1 those people from Dataset 2 that are unique and not already in the dataset only.

I know how to do this in two steps (merge the two such that only the people who are in Dataset 2 but not Dataset 1 are left, then stack that with Dataset 1 using a set statement), but I wanted to know if there's a way to do it in one step.

Any help is much appreciated.

Re: Stacking two datasets with overlapping IDs

DanZ — Fri, 22 Jan 2016 17:23:12 GMT

    data want;
    merge  have1(in=a)
           have2(in=b);
    by key;
    if (b=1 and a=0) /*keep people from 2 if not in 1*/
    or (b=0 and a=1) /*keep people that 1 if not in 2*/
    ;
    run;

Re: Stacking two datasets with overlapping IDs

data_null__ — Fri, 22 Jan 2016 17:22:01 GMT

This may do what you want. Without more details it is hard to say. Assumes one record per ID.

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;
data c;
   set a b;
   by id;
   if first.id;
   run;
proc print;
   run;

Re: Stacking two datasets with overlapping IDs

Reeza — Fri, 22 Jan 2016 17:33:22 GMT

A SQL solution that's probably not more efficient 🙂 and after borrowing @data_null__'s fake data.

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;

proc sql;
create table want as
select * from(
select * from a
union
select * from b
where id not in (select distinct Id from a)) a
order by id;
quit;

Re: Stacking two datasets with overlapping IDs

PGStats — Fri, 22 Jan 2016 18:37:43 GMT

What is missing in your request is what to do with people that are present in both datasets but with different information. If you want to keep both records, try this approach:

data a;
input id x$;
datalines;
1 a
3 b
4 c
6 d
7 e
;
data b;
input id x$;
datalines;
1 a
2 b
3 c
5 d
6 e
7 f
;

proc sql;
create table c as
select *
from
    (select * from a)
    union 
    (select * from b);
select * from c;
quit;

Re: Stacking two datasets with overlapping IDs

Astounding — Fri, 22 Jan 2016 20:39:45 GMT

As others have noted, it makes a big difference knowing how many observations might be in the data sets for each person. If there is only one observation for each person, the program is actually as simple as this:

data want;

merge dataset2 dataset1;

by key;

run;

Any KEY values that appear in both data sets will have the DATASET1 values in them after a MERGE. Of course, this assumes you are working with sorted data sets.

If there could be more than one observation per person in either data set, we need to know that. And we also need to know more details about what you want the outcome to be.