- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Being still novel to hashing I wonder if anyone can make my attempt for a left-join more elegant and/ or efficient.
Here is what I have (first generating sample data as preparation called [have_source], then calculate needed structure in dataset [have]) :
* Generate sample data;
data have_source (drop=i);
do id=1 to 1000;
do i=65 to 65+6; * Generate letters A-G;
do arefpce=1 to 9; * [arefpce] = episode counter (episode can be x days);
paramcd=byte(i);
aval=rand("integer", 0, 15); * Random number of events within one episode of type "paramcd";
output;
end;
end;
end;
run;
Create needed structure using data [have_source]:
data have (drop= aval_:);
do until (last.paramcd);
set have_source;
by id paramcd arefpce;
* Summarize number of events in episodes 2-3;
if (arefpce >= 2 AND arefpce <= 3) then do;
aval_23=sum(aval_23, aval);
end;
* Summarize number of events in episodes 4-9;
if (arefpce >= 4 AND arefpce <= 9) then do;
aval_49=sum(aval_49, aval);
end;
* Summarize number of events in episodes 7-9;
if (arefpce >= 7 AND arefpce <= 9) then do;
aval_79=sum(aval_79, aval);
end;
end;
* Print summarized results for each time frame ([arefpce]=2-3, 4-9, 7-9) separately to output;
arefpce=203;
aval=aval_23;
output;
arefpce=409;
aval=aval_49;
output;
arefpce=709;
aval=aval_79;
output;
* Run through dataset again to output "detail" data where calculations were based on;
do until (last.paramcd);
set have;
by id paramcd arefpce;
output;
end;
run;
Sort it:
* Sort dataset to have the detail data before the summarized data;
proc sort data=have;
by id paramcd arefpce;
run;
The first few lines look like this:
------------------
Now the "want" part: A left join using subsets of the [have] dataset.
First the SQL solution (serving as reference):
proc sql;
create table want_sql as
select l.id
, l.paramcd
, l.aval as aval23
, r49.aval as aval49
, r79.aval as aval79
from have (where=(arefpce eq 203)) as l
left join have (where=(arefpce eq 409)) as r49 on l.id eq r49.id and l.paramcd eq r49.paramcd
left join have (where=(arefpce eq 709)) as r79 on l.id eq r79.id and l.paramcd eq r79.paramcd;
quit;
The same using base SAS:
data want_base (drop=arefpce:);
merge have (where=(arefpce_23 = 203) rename=(aval=aval_23 arefpce=arefpce_23) in=h_23)
have (where=(arefpce_49 = 409) rename=(aval=aval_49 arefpce=arefpce_49) in=h_49)
have (where=(arefpce_79 = 709) rename=(aval=aval_79 arefpce=arefpce_79) in=h_79);
by id paramcd;
if h_23;
run;
The output looks like this:
--------------------------------
My attempt on hashing works, but is lacking clarity/ efficiency and also not so concise than I imagine it could be:
* Want --> Left join using hashing (not SQL) in an elegant and/ or efficient manner;
data _null_;
if 0 then set have (keep=id paramcd /*aval*/)
have (keep=id aval rename=(aval=aval23))
have (keep=id aval rename=(aval=aval49))
have (keep=id aval rename=(aval=aval79));
length table $41
_where $200;
if _n_=1 then do;
declare hash h_out(ordered: "yes");
h_out.defineKey("id", "paramcd");
h_out.defineData("id", "paramcd", "aval23", "aval49", "aval79");
h_out.defineDone();
table="have";
_where="(where=(arefpce=203) rename=(aval=aval23))";
declare hash h23(dataset: cats(table, _where));
h23.defineKey("id", "paramcd");
h23.defineData("aval23");
h23.defineDone();
_where="(where=(arefpce=409) rename=(aval=aval49))";
declare hash h49(dataset: cats(table, _where));
h49.defineKey("id", "paramcd");
h49.defineData("aval49");
h49.defineDone();
_where="(where=(arefpce=709) rename=(aval=aval79))";
declare hash h79(dataset: cats(table, _where));
h79.defineKey("id", "paramcd");
h79.defineData("aval79");
h79.defineDone();
call missing(aval23, aval49, aval79);
end;
do until (done);
set have (where=(arefpce=203)) end=done;
rc=h23.find();
if (rc=0) then h_out.replace();
if (rc=0 AND h49.find()=0) then h_out.replace();
else do;
call missing(aval49);
h_out.replace();
end;
if (rc=0 AND h79.find()=0) then h_out.replace();
else do;
call missing(aval79);
h_out.replace();
end;
end;
h_out.output(dataset: "want");
run;
Obviously solutions can emphasize several aspects (brevity, readability, flexibility, elegance, etc.) and therefore be rather cryptic, verbatim, more complex (hash of hashes), etc. So I'm happy to learn some more options on this example.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a simple hash "left join"
data want_hash (drop=rc);
if _N_ = 1 then do;
dcl hash h49 (dataset : "have(where=(arefpce=409) rename=(aval=aval49))");
h49.defineKey("id", "paramcd");
h49.defineData("aval49");
h49.defineDone();
dcl hash h79 (dataset : "have(where=(arefpce=709) rename=(aval=aval79))");
h79.defineKey("id", "paramcd");
h79.defineData("aval79");
h79.defineDone();
end;
set have(where=(arefpce=203) rename=(aval=aval23));
call missing (aval49, aval79);
rc = h49.find();
rc = h79.find();
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a simple hash "left join"
data want_hash (drop=rc);
if _N_ = 1 then do;
dcl hash h49 (dataset : "have(where=(arefpce=409) rename=(aval=aval49))");
h49.defineKey("id", "paramcd");
h49.defineData("aval49");
h49.defineDone();
dcl hash h79 (dataset : "have(where=(arefpce=709) rename=(aval=aval79))");
h79.defineKey("id", "paramcd");
h79.defineData("aval79");
h79.defineDone();
end;
set have(where=(arefpce=203) rename=(aval=aval23));
call missing (aval49, aval79);
rc = h49.find();
rc = h79.find();
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, you could do:
data want_hash (drop=aval rc);
dcl hash h49 ();
h49.defineKey("id", "paramcd");
h49.defineData("aval");
h49.defineDone();
dcl hash h79 ();
h79.defineKey("id", "paramcd");
h79.defineData("aval");
h79.defineDone();
do until (lr1);
set have (where=(arefpce in (409, 709))) end=lr1 ;
if arefpce=409 then h49.add();
else h79.add();
end;
do until (lr2);
set have (where=(arefpce = 203)) end=lr2;
aval23 = aval;
rc = h49.find(); aval49 = aval;
rc = h79.find(); aval79 = aval;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You create a hash for the AVAL23 subset, but you also read the data directly. Why? And there is no reason to put the output in a hash. That is mostly used when you want to create a number of tables dynamically.
I do not think hashes are called for in this case, but if you must do it, just use them to get at the two additional tables in the left join:
data want;
set have (where=(arefpce=203) rename=(aval=aval23));
if _N_=1 then do;
declare hash h49(dataset: 'have(where=(arefpce=409)');
h49.defineKey("id", "paramcd");
h49.defineData("aval");
h49.defineDone();
declare hash h79(dataset: 'have(where=(arefpce=709)');
h79.defineKey("id", "paramcd");
h79.defineData("aval");
h79.defineDone();
end;
if h49.find()=0 then
aval49=aval;
if h79.find()=0 then
aval79=aval;
keep id paramcd aval23 aval49 aval79;
run;