@ukfirebrand Thank you for interesting feedback and makes it a nice discussion thread. It's difficult for me to gauge performance on a university lab pc sas. Anyways, I am learning something from people like you who are in a production environment. Out of curiosity, may i ask a few questions. Do you have any stats on memory consumed by hash process? CPU I/O and memory did you check memsize options How big or massive are your datasets? did you raise the hashexp to the max i.e 20.making it a sufficiently large table ? any idea on the RAM capacity you have? a return code rc = object.DEFINEDONE(MEMRC: 'y'); will let you know failure even at instantiating time I would personally explore and exhaust all hash options should i know my hardware and ram well. That's because hashing is one area of my profound interest. Anyways more fun, here's another way using Double DOW, if you have time and interest, please let me know how this one performs. data have;
input id seq source_id;
cards;
1 1 1
1 2 3
1 3 11
1 4 1
1 5 11
1 6 3
1 7 11
1 8 11
1 9 1
1 10 1
1 11 1
1 12 3
1 13 1
1 14 11
1 15 3
;
run;
data want;
call missing(_k);
do _n_=1 by 1 until(last.id);
set have;
by id seq;
array temp(100) _temporary_;/*subscript arbitrary for test purpose*/
if first.id then call missing(of temp(*));
if source_id=11 then _k=1;
if first.id and last.id and source_id=11 then temp(_n_)=_n_;
else if source_id=1 and lag(source_id)=11 then temp(_n_)=_n_;
else if lag(source_id)=11 and (source_id=11 or source_id not in (1,11)) then temp(_n_)=_n_-1;
if source_id=1 and lag(source_id)=1 and _k then temp(_n_)=_n_;
end;
do _n_=1 by 1 until(last.id);
set have;
by id seq;
if _n_ in temp then output;
end;
drop _k;
run;
... View more