I am trying to merge two data sets. Dataset1 has multiple observations for each id, while dataset2 has a single row of observations for each id. I want to merge the data sets so that the observations in dataset2 are duplicated into each id in dataset1.
For example if this is the data set I have:
dataset1;
input id $ weight age;
cards;
1 108 23
1 114 24
1 118 25
2 100 45
2 108 46
2 110 47
;
run;
dataset2;
input id $ income married $ smoking $
cards;
1 40,000 y y
2 58,000 y n
;
run;
I want to combine so that the final data set looks like this:
datafinal;
input id $ weight age income married $ smoking $;
cards;
1 108 23 40,000 y y
1 114 24 40,000 y y
1 118 25 40,000 y y
2 100 45 58,000 y n
2 108 46 58,000 y n
2 110 47 58,000 y n
;
run;
I tried doing
data combined;
merge dataone datatwo;
run;
but what I ended up was with
datafinal;
input id $ weight age income married $ smoking $;
1 108 23 40,000 y y
1 114 24
1 118 25
2 100 45 58,000 y n
2 108 46
2 110 47
because the data from dataset2 was not copied into each id in dataset1. Is there a way that I can duplicate data while merging two sets?
Thanks!
data set1;
input id $ weight age;
cards;
1 108 23
1 114 24
1 118 25
2 100 45
2 108 46
2 110 47
;
run;
data set2;
input id $ income :comma10. married $ smoking $;
cards;
1 40,000 y y
2 58,000 y n
;
run;
data want;
merge set1 set2;
by id;
run;
data set1;
input id $ weight age;
cards;
1 108 23
1 114 24
1 118 25
2 100 45
2 108 46
2 110 47
;
run;
data set2;
input id $ income :comma10. married $ smoking $;
cards;
1 40,000 y y
2 58,000 y n
;
run;
data want;
merge set1 set2;
by id;
run;
use BY:
data set1;
input id $ weight age;
cards;
1 108 23
1 114 24
1 118 25
2 100 45
2 108 46
2 110 47
;
run;
data set2;
input id $ income married $ smoking $;
cards;
1 40000 y y
2 58000 y n
;
run;
data combined;
merge set1 set2;
by id;
run;
proc print;
run;
B.
Hi,
Try doing a LEFT JOIN instead of MERGE. Use the column ID for doing the JOIN function.
Many to one merges work fine with data step MERGE statement.
But you have to tell it what variable(s) to use to match them.
That is done with the BY statement.
You should have seen a note in the log warning that you forgot to include the BY statement.
Note that both datasets need to be sorted by the BY variable(s).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.