I posted this question awhile back and received a great solution via a Hash Object via user Ksharp data new;
input CustID TranAmt:dollar6. TranDate:mmddyy10.;
format TranDate mmddyy10.;
datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;
data want;
if _n_=1 then do;
if 0 then set new(rename=(TranAmt=_TranAmt));
declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
h.definekey('CustID','TranDate');
h.definedata('_TranAmt');
h.definedone();
end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
rc=h.find(key:CustId,key:i);
do while(rc=0);
count_8day+1;
sum_8day+_TranAmt;
rc=h.find_next(key:CustId,key:i);
end;
end;
drop i rc _TranAmt;
run; This works, for "small" datasets, but now I'm working with datasets in the 40-60 million observation range, and the Hash Object solution is throwing an error that's running out of memory. I've posted the HASH solution below for reference. The problem I'm trying to solve is count the number of transactions that occured in the last X days. The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions. Any suggestions on how to adapt this for much larger datasets? Thank you.
... View more