I'm trying to do a data step with multiple hash joins, and the main criteria is that the first hash join has multiple records and will grow my initial dataset. After I join that, I want to continue with that larger dataset and do more 1-to-1 hash joins with additional logic - however, i'm having trouble with this second step.
Here is some example code (ideally, I would see field_4 in my final output):
data data1;
infile datalines delimiter='|';
format
field_1 $10.
field_2 $10.
;
input
field_1: $10.
field_2: $10.
;
datalines;
A|M
B|F
C|F
run;
data data2;
infile datalines delimiter='|';
format
field_1 $10.
field_3 $10.
;
input
field_1: $10.
field_3: $10.
;
datalines;
A|A
B|A
B|B
B|C
C|A
C|D
run;
data data3;
infile datalines delimiter='|';
format
field_3 $10.
field_4 5.
;
input
field_3: $10.
field_4: 5.
;
datalines;
A|1
B|4
C|7
D|9
run;
data hash_join_template;
if _n_ = 1 then do;
declare hash add1(dataset:'data2',multidata:"Y");
add1.defineKey('field_1');
add1.defineData('field_3');
add1.defineDone();
declare hash add2(dataset:'data3');
add2.defineKey('field_3');
add2.defineData('field_4');
add2.defineDone();
end;
set data1;
/* Do multidata join and output duplicate records */
format field_3 $10.;
rc_join1 = add1.find();
output;
do while (rc_join1 = 0);
rc_join1 = add1.find_next();
if rc_join1=0 then output;
end;
drop rc_join1;
/* Attempting to then take my larger-than-original dataset and continue doing logic/joins, but is blank */
format field_4 5.;
rc_join2 = add2.find();
drop rc_join2;
run;
Anyone see where I'm going wrong? Thank you ahead of time!
@derekcra your overall problem is that you want to do an iterative lookup over add1 and for each lookup in add1, do a lookup in add2 (as I interpret your code).
Therefore, you should have the add2.find() Method call inside the loop.
Something like this will do
data hash_join_template;
if _n_ = 1 then do;
declare hash add1(dataset:'data2',multidata:"Y");
add1.defineKey('field_1');
add1.defineData('field_3');
add1.defineDone();
declare hash add2(dataset:'data3');
add2.defineKey('field_3');
add2.defineData('field_4');
add2.defineDone();
end;
set data1;
format field_3 $10.;
format field_4 5.;
do while (add1.do_over() = 0);
rc_join2 = add2.find();
output;
end;
drop rc_join2;
run;
Result:
field_1 field_2 field_3 field_4 A M A 1 B F A 1 B F B 4 B F C 7 C F A 1 C F D 9
@derekcra your overall problem is that you want to do an iterative lookup over add1 and for each lookup in add1, do a lookup in add2 (as I interpret your code).
Therefore, you should have the add2.find() Method call inside the loop.
Something like this will do
data hash_join_template;
if _n_ = 1 then do;
declare hash add1(dataset:'data2',multidata:"Y");
add1.defineKey('field_1');
add1.defineData('field_3');
add1.defineDone();
declare hash add2(dataset:'data3');
add2.defineKey('field_3');
add2.defineData('field_4');
add2.defineDone();
end;
set data1;
format field_3 $10.;
format field_4 5.;
do while (add1.do_over() = 0);
rc_join2 = add2.find();
output;
end;
drop rc_join2;
run;
Result:
field_1 field_2 field_3 field_4 A M A 1 B F A 1 B F B 4 B F C 7 C F A 1 C F D 9
Ah I see - thank you! I will play around with this but it makes sense offhand. Thanks again.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.