Being still novel to hashing I wonder if anyone can make my attempt for a left-join more elegant and/ or efficient. Here is what I have (first generating sample data as preparation called [have_source], then calculate needed structure in dataset [have]) : * Generate sample data;
data have_source (drop=i);
do id=1 to 1000;
do i=65 to 65+6; * Generate letters A-G;
do arefpce=1 to 9; * [arefpce] = episode counter (episode can be x days);
paramcd=byte(i);
aval=rand("integer", 0, 15); * Random number of events within one episode of type "paramcd";
output;
end;
end;
end;
run; Create needed structure using data [have_source]: data have (drop= aval_:);
do until (last.paramcd);
set have_source;
by id paramcd arefpce;
* Summarize number of events in episodes 2-3;
if (arefpce >= 2 AND arefpce <= 3) then do;
aval_23=sum(aval_23, aval);
end;
* Summarize number of events in episodes 4-9;
if (arefpce >= 4 AND arefpce <= 9) then do;
aval_49=sum(aval_49, aval);
end;
* Summarize number of events in episodes 7-9;
if (arefpce >= 7 AND arefpce <= 9) then do;
aval_79=sum(aval_79, aval);
end;
end;
* Print summarized results for each time frame ([arefpce]=2-3, 4-9, 7-9) separately to output;
arefpce=203;
aval=aval_23;
output;
arefpce=409;
aval=aval_49;
output;
arefpce=709;
aval=aval_79;
output;
* Run through dataset again to output "detail" data where calculations were based on;
do until (last.paramcd);
set have;
by id paramcd arefpce;
output;
end;
run; Sort it: * Sort dataset to have the detail data before the summarized data;
proc sort data=have;
by id paramcd arefpce;
run; The first few lines look like this: Dataset "have" (first few lines) ------------------ Now the "want" part: A left join using subsets of the [have] dataset. First the SQL solution (serving as reference): proc sql;
create table want_sql as
select l.id
, l.paramcd
, l.aval as aval23
, r49.aval as aval49
, r79.aval as aval79
from have (where=(arefpce eq 203)) as l
left join have (where=(arefpce eq 409)) as r49 on l.id eq r49.id and l.paramcd eq r49.paramcd
left join have (where=(arefpce eq 709)) as r79 on l.id eq r79.id and l.paramcd eq r79.paramcd;
quit; The same using base SAS: data want_base (drop=arefpce:);
merge have (where=(arefpce_23 = 203) rename=(aval=aval_23 arefpce=arefpce_23) in=h_23)
have (where=(arefpce_49 = 409) rename=(aval=aval_49 arefpce=arefpce_49) in=h_49)
have (where=(arefpce_79 = 709) rename=(aval=aval_79 arefpce=arefpce_79) in=h_79);
by id paramcd;
if h_23;
run; The output looks like this: Dataset [want] (first few lines) -------------------------------- My attempt on hashing works, but is lacking clarity/ efficiency and also not so concise than I imagine it could be: * Want --> Left join using hashing (not SQL) in an elegant and/ or efficient manner;
data _null_;
if 0 then set have (keep=id paramcd /*aval*/)
have (keep=id aval rename=(aval=aval23))
have (keep=id aval rename=(aval=aval49))
have (keep=id aval rename=(aval=aval79));
length table $41
_where $200;
if _n_=1 then do;
declare hash h_out(ordered: "yes");
h_out.defineKey("id", "paramcd");
h_out.defineData("id", "paramcd", "aval23", "aval49", "aval79");
h_out.defineDone();
table="have";
_where="(where=(arefpce=203) rename=(aval=aval23))";
declare hash h23(dataset: cats(table, _where));
h23.defineKey("id", "paramcd");
h23.defineData("aval23");
h23.defineDone();
_where="(where=(arefpce=409) rename=(aval=aval49))";
declare hash h49(dataset: cats(table, _where));
h49.defineKey("id", "paramcd");
h49.defineData("aval49");
h49.defineDone();
_where="(where=(arefpce=709) rename=(aval=aval79))";
declare hash h79(dataset: cats(table, _where));
h79.defineKey("id", "paramcd");
h79.defineData("aval79");
h79.defineDone();
call missing(aval23, aval49, aval79);
end;
do until (done);
set have (where=(arefpce=203)) end=done;
rc=h23.find();
if (rc=0) then h_out.replace();
if (rc=0 AND h49.find()=0) then h_out.replace();
else do;
call missing(aval49);
h_out.replace();
end;
if (rc=0 AND h79.find()=0) then h_out.replace();
else do;
call missing(aval79);
h_out.replace();
end;
end;
h_out.output(dataset: "want");
run; Obviously solutions can emphasize several aspects (brevity, readability, flexibility, elegance, etc.) and therefore be rather cryptic, verbatim, more complex (hash of hashes), etc. So I'm happy to learn some more options on this example.
... View more