01-22-2016 12:02 PM
I have two datasets which I would like to stack. They have the same variables. The tricky part is that I only want to keep people from Dataset 2 if they don't already have an entry in Dataset 1; that is, I want to add to Dataset 1 those people from Dataset 2 that are unique and not already in the dataset only.
I know how to do this in two steps (merge the two such that only the people who are in Dataset 2 but not Dataset 1 are left, then stack that with Dataset 1 using a set statement), but I wanted to know if there's a way to do it in one step.
Any help is much appreciated.
01-22-2016 12:14 PM - edited 01-22-2016 12:23 PM
data want; merge have1(in=a) have2(in=b); by key; if (b=1 and a=0) /*keep people from 2 if not in 1*/ or (b=0 and a=1) /*keep people that 1 if not in 2*/ ; run;
01-22-2016 12:15 PM - edited 01-22-2016 12:22 PM
This may do what you want. Without more details it is hard to say. Assumes one record per ID.
data a; do id = 1,3,4,6,7; output; end; retain x 'a'; run; data b; do id=1,2,3,5,6,7; output; end; retain x 'b'; run; data c; set a b; by id; if first.id; run; proc print; run;
01-22-2016 12:33 PM
A SQL solution that's probably not more efficient and after borrowing @data_null__'s fake data.
data a; do id = 1,3,4,6,7; output; end; retain x 'a'; run; data b; do id=1,2,3,5,6,7; output; end; retain x 'b'; run; proc sql; create table want as select * from( select * from a union select * from b where id not in (select distinct Id from a)) a order by id; quit;
01-22-2016 01:37 PM
What is missing in your request is what to do with people that are present in both datasets but with different information. If you want to keep both records, try this approach:
data a; input id x$; datalines; 1 a 3 b 4 c 6 d 7 e ; data b; input id x$; datalines; 1 a 2 b 3 c 5 d 6 e 7 f ; proc sql; create table c as select * from (select * from a) union (select * from b); select * from c; quit;
01-22-2016 03:39 PM
As others have noted, it makes a big difference knowing how many observations might be in the data sets for each person. If there is only one observation for each person, the program is actually as simple as this:
merge dataset2 dataset1;
Any KEY values that appear in both data sets will have the DATASET1 values in them after a MERGE. Of course, this assumes you are working with sorted data sets.
If there could be more than one observation per person in either data set, we need to know that. And we also need to know more details about what you want the outcome to be.