BookmarkSubscribeRSS Feed
j_b_s
Calcite | Level 5

I am trying to merge two datasets where the first data set contains two zip codes for each individual--their home zip code and their work zip code. The second data set relates each possible zip code to two variables (i.e. RUCA1 and RUCA2). I want to end up with a dataset with variables for RUCA1 and RUCA2 matched on home zip code and RUCA1 and RUCA2 matched on work zip code--creating a total of 4 RUCA variables.

 

dataset1

id ziphome zipwork

1 a a

2 b c

3 c b

4 d 

5 e a

 

dataset2

zip RUCA1 RUCA2

a 10 10

b 9 9.2

c 8 8.6

d 1 1.1

e 10 10

 

What I want to end up with in dataset3 is two sets of RUCA1 and RUCA2 variables created for each zipcode (home and work)--matched by zip code. If I merge using a BY statement, my zip code variables won't match between data sets (i.e. ziphome in dataset1 and zip in dataset2). 

 

dataset3

ziphome zipwork RUCA1home RUCA2home RUCA1work RUCA2work

a a 10 10 10 10

b c 9 9.2 8 8.6

c b 8 8.6 9 9.2

d   1 1.1

e a 10 10 10 10

 

In dataset3, ziphome and zipwork would be retained from dataset1 and RUCA1home and RUCA2home would be created by matching ziphome with zip, and RUCA1work and RUCA2work would be created by matching zipwork with zip.

 

A clumsy solution would be to duplicate the variable ziphome to a variable named zip in dataset1. Do the merge. Rename RUCA1 and RUCA2 to RUCA1home and RUCA2home. Then drop zip. Then duplicate zipwork to zip in dataset3. Do the merge. Then drop zip from dataset4. And rename RUCA1 and RUCA2 to RUCAwork1 and RUCA2work. But I feel like there should be a cleaner process.

1 REPLY 1
Patrick
Opal | Level 21

Below two options. Option 2 should perform better because it doesn't require any sorting.

data have1;
  infile datalines truncover;
  input id (ziphome zipwork) ($);
  datalines;
1 a a
2 b c
3 c b
4 d 
5 e a
;

data have2;
  infile datalines truncover;
  input zip $ RUCA1 RUCA2;
  datalines;
a 10 10
b 9 9.2
c 8 8.6
d 1 1.1
e 10 10
;

/* option 1 */
proc sql;
  create table want as
  select
    l.id,
    l.ziphome,
    l.zipwork,
    rh.ruca1 as ruca1home,
    rh.ruca2 as ruca2home,
    rw.ruca1 as ruca1work,
    rw.ruca2 as ruca2work

  from have1 l 

  left join have2 rh
  on l.ziphome=rh.zip

  left join have2 rw
  on l.zipwork=rw.zip

  order by l.id
  ;
quit;

/* option 2 */
data want;
  set have1;
  if _n_=1 then
    do;
      if 0 then set have2(keep=zip ruca1 ruca2);
      dcl hash h1(dataset:"have2");
      h1.defineKey("zip");
      h1.defineData("ruca1","ruca2");
      h1.defineDone();
    end;
  if h1.find(key:ziphome)=0 then
    do;
      ruca1home=ruca1;
      ruca2home=ruca2;
    end;
  if h1.find(key:zipwork)=0 then
    do;
      ruca1work=ruca1;
      ruca2work=ruca2;
    end;
  drop zip ruca1 ruca2;
run;

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1097 views
  • 0 likes
  • 2 in conversation