Sarah,
Here are some data step code. Please note:
1. if you have 18M records, and data came as being presented, you only need to upload 3 variables to Hash table, which is roughly 3*8*18million/1024/1024 = 412M bytes, so theoretically if you have Giga + byte memory (which is common), you should be able to fit the whole thing into Hash table. If not, we will have to break it up by ID, and it will take 2XPass, meaning it will take at leaset double amount of time to finish.
2. The code can still be tweaked to be more efficient, in theory, such as using Hiter plus SETCUR method, however, I am NOT sure how much leverage you can benefit from it.
data want;
if _n_=1 then
do;
if 0 then
set sample;
dcl hash id(dataset:'sample (keep=individual_id prescriber_num date_dispensed rename=(prescriber_num=_pre date_dispensed=_date))', multidata : 'y');
id.definekey('individual_id');
id.definedata('individual_id', '_pre', '_date');
id.definedone();
dcl hash pre();
pre.definekey('_pre');
pre.definedone();
call missing(_pre,_date);
end;
set sample;
rc=id.find();
_beg=intnx('year',date_dispensed,-1,'s');
do rc=0 by 0 while (rc=0);
if _beg <= _date <= date_dispensed then
rc=pre.add();
rc=id.find_next();
end;
prov_cnt_back=pre.num_items;
rc=pre.clear();
drop rc _:;
run;
Update: To call INTNX() only once per obs instead of up to 365 times, maybe saving your some time.
... View more