First timer here, please be gentle!
I've included some code below as well as actual and expected results. My code is clearly not working and it was dumb luck I thought it was. I'm trying to count the number of transactions that occured in the last 8 days. The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions, where the code right now is showing 5.
I'd appreciate any possible solution you have to this problem, my back's against the wall trying to correct this. Thank you.
Jon
data new;
input CustID TranAmt:dollar6. TranDate:mmddyy10.;
datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014;
run;
proc print;
run;
proc sort data=new;
by CustID TranDate;
run;
data new2;
set new;
by CustID;
Prev=lag1(TranDate);
if first.CustID=1 then TranCountLast8Days=1;
else if TranDate<prev+8 then TranCountLast8Days+1;
else TranCountLast8Days=1;
format TranDate Prev date9.;
run;
CustID | TranAmt | TranDate | Prev | TranCountLast8Days | EXPECTED |
111 | 100 | 1-Jun-14 | . | 1 | 1 |
111 | 200 | 2-Jun-14 | 1-Jun-14 | 2 | 2 |
111 | 500 | 4-Jun-14 | 2-Jun-14 | 3 | 3 |
111 | 500 | 10-Jun-14 | 4-Jun-14 | 4 | 2 |
111 | 100 | 11-Jun-14 | 10-Jun-14 | 5 | 3 |
222 | 200 | 3-Jun-14 | 11-Jun-14 | 1 | 1 |
222 | 500 | 4-Jun-14 | 3-Jun-14 | 2 | 2 |
222 | 600 | 4-Jun-14 | 4-Jun-14 | 3 | 3 |
222 | 600 | 11-Jun-14 | 4-Jun-14 | 4 | 3 |
222 | 600 | 19-Jun-14 | 11-Jun-14 | 1 | 1 |
222 | 600 | 28-Jun-14 | 19-Jun-14 | 1 | 1 |
222 | 600 | 29-Jun-14 | 28-Jun-14 | 2 | 2 |
The simplest way is using Hash Table. And assuming you have big memory to hold all these six million obs. If that is not worked, tell me , I can use other skill. data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; data want; if _n_=1 then do; if 0 then set new(rename=(TranAmt=_TranAmt)); declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y'); h.definekey('CustID','TranDate'); h.definedata('_TranAmt'); h.definedone(); end; set new; count_8day=0; sum_8day=0; do i=TranDate-7 to TranDate; rc=h.find(key:CustId,key:i); do while(rc=0); count_8day+1; sum_8day+_TranAmt; rc=h.find_next(key:CustId,key:i); end; end; drop i rc _TranAmt; run;
Assuming your table is not big . data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; proc sql; create table want as select *, (select count(*) from new where CustId=a.CustId and TranDate between a.TranDate-7 and a.TranDate) as count from new as a; quit;
This worked great on the sample dataset, but I my real set as 6 million rows, and it's been running for 2 hours, which is just too long. In addition to this, I'll need to calcualate the sum of the TranAmt field, and I can already see the requests coming to do this for number of trans (and sum of trans) in 4 days, 8 days, 30, days, 60 days. Would these need separate queries? Granted a new methodology might be needed for this large of a dataset. Thanks again for your prompt response, for a guy learning SAS, and I look forward to any additional insights/solutions you might have.
The simplest way is using Hash Table. And assuming you have big memory to hold all these six million obs. If that is not worked, tell me , I can use other skill. data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; data want; if _n_=1 then do; if 0 then set new(rename=(TranAmt=_TranAmt)); declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y'); h.definekey('CustID','TranDate'); h.definedata('_TranAmt'); h.definedone(); end; set new; count_8day=0; sum_8day=0; do i=TranDate-7 to TranDate; rc=h.find(key:CustId,key:i); do while(rc=0); count_8day+1; sum_8day+_TranAmt; rc=h.find_next(key:CustId,key:i); end; end; drop i rc _TranAmt; run;
So I'm back with another question. The HASH table example worked great, but I'm now working on a MUCH larger SAS Dataset about 60MM obs. The hash object keeps running out of memory and giving this message:
ERROR: Hash object added 26214384 items when memory failure occurred. FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase. ERROR: The SAS System stopped processing this step because of insufficient memory.
SAS is running on a Unix/SAS Grid environment. I've looked into setting a "memsize" setting, but not sure if (or how) to do it in this setting, or if it will work. If there's another methodology that will work I'm all ears. Thanks in advance for the assistance.
It is not clear what the date limit is. Here are two alternatives:
data new;
input CustID TranAmt:dollar6. TranDate:mmddyy10.;
datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
/* Trans counts relative to a fixed date */
%let fixedDate='11JUN2014'd;
/* Could be %let fixedDate=Today(); */
proc sql;
select custId, count(*) as nbTransIn8Days
from new
where trandate > intnx("DAY", &fixedDate., -8)
group by custId;
quit;
/* Trans counts relative to the date of the last transaction for
each custId */
proc sql;
select custId, count(*) as nbTransIn8Days
from
( select *
from new
group by custId
having tranDate > intnx("DAY", max(TranDate), -8)
)
group by custId;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.