BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets which I would like to stack. They have the same variables. The tricky part is that I only want to keep people from Dataset 2 if they don't already have an entry in Dataset 1; that is, I want to add to Dataset 1 those people from Dataset 2 that are unique and not already in the dataset only. 

 

I know how to do this in two steps (merge the two such that only the people who are in Dataset 2 but not Dataset 1 are left, then stack that with Dataset 1 using a set statement), but I wanted to know if there's a way to do it in one step.

 

Any help is much appreciated. 

5 REPLIES 5
DanZ
Obsidian | Level 7
    data want;
    merge  have1(in=a)
           have2(in=b);
    by key;
    if (b=1 and a=0) /*keep people from 2 if not in 1*/
    or (b=0 and a=1) /*keep people that 1 if not in 2*/
    ;
    run;
data_null__
Jade | Level 19

This may do what you want.  Without more details it is hard to say.  Assumes one record per ID.

 

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;
data c;
   set a b;
   by id;
   if first.id;
   run;
proc print;
   run;

 

 

 Capture.PNG

Reeza
Super User

A SQL solution that's probably not more efficient 🙂 and after borrowing @data_null__'s fake data.

 

data a;
   do id = 1,3,4,6,7;
      output;
      end;
   retain x 'a';
   run;
data b;
   do id=1,2,3,5,6,7;
      output;
      end;
   retain x 'b';
   run;

proc sql;
create table want as
select * from(
select * from a
union
select * from b
where id not in (select distinct Id from a)) a
order by id;
quit;
PGStats
Opal | Level 21

What is missing in your request is what to do with people that are present in both datasets but with different information. If you want to keep both records, try this approach:

 

data a;
input id x$;
datalines;
1 a
3 b
4 c
6 d
7 e
;
data b;
input id x$;
datalines;
1 a
2 b
3 c
5 d
6 e
7 f
;

proc sql;
create table c as
select *
from
    (select * from a)
    union 
    (select * from b);
select * from c;
quit;
PG
Astounding
PROC Star

As others have noted, it makes a big difference knowing how many observations might be in the data sets for each person.  If there is only one observation for each person, the program is actually as simple as this:

 

data want;

merge dataset2 dataset1;

by key;

run;

 

Any KEY values that appear in both data sets will have the DATASET1 values in them after a MERGE.  Of course, this assumes you are working with sorted data sets.

 

If there could be more than one observation per person in either data set, we need to know that.  And we also need to know more details about what you want the outcome to be.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2216 views
  • 0 likes
  • 6 in conversation