Hi all. I hope folks are healthy!
Should be a simple question. How to join two data sets that have mostly the same, but a few different, IDs?
data set 1
ID var1 var2
A 1 2
B 2 1
C 4 2
data set 2
ID var1 var2
A 1 2
C 4 2
D 2 1
Or create 3 data sets, one with common IDs, one with IDs only in data set 1, and one with IDs only in data set 2. I want to identify the IDs that are only in one data set.
Thanks
Stay healthy, please.
@geneshackman wrote:
Hi all. I hope folks are healthy!
Should be a simple question. How to join two data sets that have mostly the same, but a few different, IDs?
data set 1
ID var1 var2
A 1 2
B 2 1
C 4 2
data set 2
ID var1 var2
A 1 2
C 4 2
D 2 1
What is the desired output? All records from both data sets including mismatches, or only those records in both, or something else? Do you want the values of var1 and var2 in one data set to overwrite the values on the matching record in the other data set?
Good question, thanks for asking. I would like the output to be something like this
ID data1var1 data1var2 data2var1 data2var2
A 1 2 1 2
B 2 1
C 4 2 4 2
D 2 1
I'm not sure how the above will show up, so let me see if this makes sense. I guess the columns would be ID, then the variables in data set 1, then the variables in data set 2. I could rename the variables in data set 1 or 2 or both to have "dataset1" at the beginning of the var name, like dataset1var1, or shorter, d1var1, d1var2, d2var1, d2var2.
Make sense?
The output would have all records from both data sets, including IDs are only in one of them.
/* UNTESTED CODE */
data want;
merge dataset1(rename=(var1=ds1_var1 var2=ds1_var2))
dataset2(rename(var1=ds2_var1 var2=ds2_var2));
by id;
run;
This assumes both datasets are sorted by the variable named ID.
Hello,
The below program may be beneficial to you as well :
data set_1;
input ID $ var1 var2;
cards;
A 1 2
B 2 1
C 4 2
;
run;
data set_2;
input ID $ var1 var2;
cards;
A 1 2
C 4 2
D 2 1
;
run;
proc sort data=set_1 out=s_set_1;
by ID;
run;
proc sort data=set_2 out=s_set_2;
by ID;
run;
data want1 want2 want3;
merge s_set_1(in=int1) s_set_2(in=int2);
by ID;
if int1 and int2 then output want1;
if int1 and not int2 then output want2;
if not int1 and int2 then output want3;
run;
/* end of program */
Koen
Thanks, this looks great too.
data set_1;
input ID $ var1 var2;
cards;
A 1 2
B 2 1
C 4 2
;
run;
data set_2;
input ID $ var1 var2;
cards;
A 1 2
C 4 2
D 2 1
;
run;
proc sql;
create table commonIDs as
select id from set_1
intersect
select id from set_2;
create table IDsonlyindataset1 as
select id from set_1
except
select id from set_2;
create table IDsonlyindataset2 as
select id from set_2
except
select id from set_1;
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.