I am trying to merge two datasets where the first data set contains two zip codes for each individual--their home zip code and their work zip code. The second data set relates each possible zip code to two variables (i.e. RUCA1 and RUCA2). I want to end up with a dataset with variables for RUCA1 and RUCA2 matched on home zip code and RUCA1 and RUCA2 matched on work zip code--creating a total of 4 RUCA variables.
dataset1
id ziphome zipwork
1 a a
2 b c
3 c b
4 d
5 e a
dataset2
zip RUCA1 RUCA2
a 10 10
b 9 9.2
c 8 8.6
d 1 1.1
e 10 10
What I want to end up with in dataset3 is two sets of RUCA1 and RUCA2 variables created for each zipcode (home and work)--matched by zip code. If I merge using a BY statement, my zip code variables won't match between data sets (i.e. ziphome in dataset1 and zip in dataset2).
dataset3
ziphome zipwork RUCA1home RUCA2home RUCA1work RUCA2work
a a 10 10 10 10
b c 9 9.2 8 8.6
c b 8 8.6 9 9.2
d 1 1.1
e a 10 10 10 10
In dataset3, ziphome and zipwork would be retained from dataset1 and RUCA1home and RUCA2home would be created by matching ziphome with zip, and RUCA1work and RUCA2work would be created by matching zipwork with zip.
A clumsy solution would be to duplicate the variable ziphome to a variable named zip in dataset1. Do the merge. Rename RUCA1 and RUCA2 to RUCA1home and RUCA2home. Then drop zip. Then duplicate zipwork to zip in dataset3. Do the merge. Then drop zip from dataset4. And rename RUCA1 and RUCA2 to RUCAwork1 and RUCA2work. But I feel like there should be a cleaner process.