I have been reading a lot about using the Hash of Hashes technique to reduce hard-coding and increase the flexibility of your program. I wanted to try it out by defining a program that takes an unknown number of tables, as well as a distinct list of members from those tables, as an input and spits out a left join on member_id for all tables. However, I can't quite get it to work. My questions are, how do I make the following code work, and what concept/concepts am I misunderstanding below in the "try" data set? I think the problem is that I am not iterating through the hash object correctly - it looks to me like I am defining the hash of hashes as I intend to (using the HoH.output statement), but the merge only seems to work on one instance of hash table h.
data have1;
infile datalines delimiter=',';
input member_id $ results_measure_1 $;
datalines;
164-111,.8
297-888,.8
495-111,1
562-888,.7
697-111,.7
833-888,.5
;
run;
data have2;
infile datalines delimiter=',';
input member_id $ results_measure_2 $;
datalines;
164-111,1
297-888,1
562-888,.2
697-111,.7
;
run;
data have3;
infile datalines delimiter=',';
input member_id $ results_measure_3 $;
datalines;
344-111,1
562-888,.5
697-111,.9
;
run;
data have4;
infile datalines delimiter=',';
input table $ ;
datalines;
have1
have2
have3
;
run;
data have5;
infile datalines delimiter=',';
input member_id $ ;
datalines;
164-111
297-888
495-111
562-888
697-111
833-888
344-111
;
run;
data want;
infile datalines delimiter=",";
input member_id $ results_measure_1 $ results_measure_2 $ results_measure_3 $;
datalines;
344-111,0,0,1
562-888,.7,.2,.5
697-111,.7,.7,.9
164-111,.8,1,0
297-888,.8,1,0
495-111,1,0,0
833-888,.5,0,0
run;
data try;
if 0 then set have:;
if _n_ = 1 then do;
dcl hash HoH();
HoH.definekey("table");
HoH.definedata("h","table");
HoH.definedone();
dcl hiter HoHi("HoH");
do until (eof);
set have4 end=eof;
dcl hash h(dataset:table);
h.definekey("member_id");
h.definedata(all:"yes");
h.definedone();
HoH.add();
end;
end;
set have5;
do while (HoHi.next() = 0);
/* rc_iter = HoHi.next();*/
rc_h = h.find();
end;
HoH.output(dataset:"test");
h.output(dataset:"test_h");
run;
First and foremost - thank you for providing a usable program, with data, that we can use for our suggested modifications.
You didn't put it in words, but apparently you want one output record for each record in have5, collecting data from the datasets named in have4, using variables from the named datasets (HAVE1, HAVE2, HAVE3).
So a single change to your program would work:
But first, I think you are assuming that the result_measure variables are not retained in the datastep as it iterates through have5 (i.e. they are reset to missing values automatically). But this is wrong, for two reasons:
This means that, before you iterate through the hash objects for each table value, you should set the result_measures to a default value (I use missing values below, because it's easy to code "call missing ...").
So the only required change is the call missing statement in the last data step:
data have1;
infile datalines delimiter=',';
input member_id $ results_measure_1 $;
datalines;
164-111,.8
297-888,.8
495-111,1
562-888,.7
697-111,.7
833-888,.5
;
run;
data have2;
infile datalines delimiter=',';
input member_id $ results_measure_2 $;
datalines;
164-111,1
297-888,1
562-888,.2
697-111,.7
;
run;
data have3;
infile datalines delimiter=',';
input member_id $ results_measure_3 $;
datalines;
344-111,1
562-888,.5
697-111,.9
;
run;
data have4;
infile datalines delimiter=',';
input table $ ;
datalines;
have1
have2
have3
;
run;
data have5;
infile datalines delimiter=',';
input member_id $ ;
datalines;
164-111
297-888
495-111
562-888
697-111
833-888
344-111
;
run;
data want0;
infile datalines delimiter=",";
input member_id $ results_measure_1 $ results_measure_2 $ results_measure_3 $;
datalines;
344-111,0,0,1
562-888,.7,.2,.5
697-111,.7,.7,.9
164-111,.8,1,0
297-888,.8,1,0
495-111,1,0,0
833-888,.5,0,0
run;
data try;
if 0 then set have:;
if _n_ = 1 then do;
dcl hash HoH();
HoH.definekey("table");
HoH.definedata("h","table");
HoH.definedone();
dcl hiter HoHi("HoH");
do until (eof);
set have4 end=eof;
dcl hash h(dataset:table);
h.definekey("member_id");
h.definedata(all:"yes");
h.definedone();
HoH.add();
end;
end;
set have5;
call missing(of results_:); /*Add this: Initialize these variables */
do while (HoHi.next() = 0);
rc_h = h.find();
end;
/** HoH.output(dataset:"test"); Drop this **/
/** h.output(dataset:"test_h"); Drop this **/
run;
First and foremost - thank you for providing a usable program, with data, that we can use for our suggested modifications.
You didn't put it in words, but apparently you want one output record for each record in have5, collecting data from the datasets named in have4, using variables from the named datasets (HAVE1, HAVE2, HAVE3).
So a single change to your program would work:
But first, I think you are assuming that the result_measure variables are not retained in the datastep as it iterates through have5 (i.e. they are reset to missing values automatically). But this is wrong, for two reasons:
This means that, before you iterate through the hash objects for each table value, you should set the result_measures to a default value (I use missing values below, because it's easy to code "call missing ...").
So the only required change is the call missing statement in the last data step:
data have1;
infile datalines delimiter=',';
input member_id $ results_measure_1 $;
datalines;
164-111,.8
297-888,.8
495-111,1
562-888,.7
697-111,.7
833-888,.5
;
run;
data have2;
infile datalines delimiter=',';
input member_id $ results_measure_2 $;
datalines;
164-111,1
297-888,1
562-888,.2
697-111,.7
;
run;
data have3;
infile datalines delimiter=',';
input member_id $ results_measure_3 $;
datalines;
344-111,1
562-888,.5
697-111,.9
;
run;
data have4;
infile datalines delimiter=',';
input table $ ;
datalines;
have1
have2
have3
;
run;
data have5;
infile datalines delimiter=',';
input member_id $ ;
datalines;
164-111
297-888
495-111
562-888
697-111
833-888
344-111
;
run;
data want0;
infile datalines delimiter=",";
input member_id $ results_measure_1 $ results_measure_2 $ results_measure_3 $;
datalines;
344-111,0,0,1
562-888,.7,.2,.5
697-111,.7,.7,.9
164-111,.8,1,0
297-888,.8,1,0
495-111,1,0,0
833-888,.5,0,0
run;
data try;
if 0 then set have:;
if _n_ = 1 then do;
dcl hash HoH();
HoH.definekey("table");
HoH.definedata("h","table");
HoH.definedone();
dcl hiter HoHi("HoH");
do until (eof);
set have4 end=eof;
dcl hash h(dataset:table);
h.definekey("member_id");
h.definedata(all:"yes");
h.definedone();
HoH.add();
end;
end;
set have5;
call missing(of results_:); /*Add this: Initialize these variables */
do while (HoHi.next() = 0);
rc_h = h.find();
end;
/** HoH.output(dataset:"test"); Drop this **/
/** h.output(dataset:"test_h"); Drop this **/
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.