- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First timer here, please be gentle!
I've included some code below as well as actual and expected results. My code is clearly not working and it was dumb luck I thought it was. I'm trying to count the number of transactions that occured in the last 8 days. The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions, where the code right now is showing 5.
I'd appreciate any possible solution you have to this problem, my back's against the wall trying to correct this. Thank you.
Jon
data new;
input CustID TranAmt:dollar6. TranDate:mmddyy10.;
datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014;
run;
proc print;
run;
proc sort data=new;
by CustID TranDate;
run;
data new2;
set new;
by CustID;
Prev=lag1(TranDate);
if first.CustID=1 then TranCountLast8Days=1;
else if TranDate<prev+8 then TranCountLast8Days+1;
else TranCountLast8Days=1;
format TranDate Prev date9.;
run;
CustID | TranAmt | TranDate | Prev | TranCountLast8Days | EXPECTED |
111 | 100 | 1-Jun-14 | . | 1 | 1 |
111 | 200 | 2-Jun-14 | 1-Jun-14 | 2 | 2 |
111 | 500 | 4-Jun-14 | 2-Jun-14 | 3 | 3 |
111 | 500 | 10-Jun-14 | 4-Jun-14 | 4 | 2 |
111 | 100 | 11-Jun-14 | 10-Jun-14 | 5 | 3 |
222 | 200 | 3-Jun-14 | 11-Jun-14 | 1 | 1 |
222 | 500 | 4-Jun-14 | 3-Jun-14 | 2 | 2 |
222 | 600 | 4-Jun-14 | 4-Jun-14 | 3 | 3 |
222 | 600 | 11-Jun-14 | 4-Jun-14 | 4 | 3 |
222 | 600 | 19-Jun-14 | 11-Jun-14 | 1 | 1 |
222 | 600 | 28-Jun-14 | 19-Jun-14 | 1 | 1 |
222 | 600 | 29-Jun-14 | 28-Jun-14 | 2 | 2 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The simplest way is using Hash Table. And assuming you have big memory to hold all these six million obs. If that is not worked, tell me , I can use other skill. data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; data want; if _n_=1 then do; if 0 then set new(rename=(TranAmt=_TranAmt)); declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y'); h.definekey('CustID','TranDate'); h.definedata('_TranAmt'); h.definedone(); end; set new; count_8day=0; sum_8day=0; do i=TranDate-7 to TranDate; rc=h.find(key:CustId,key:i); do while(rc=0); count_8day+1; sum_8day+_TranAmt; rc=h.find_next(key:CustId,key:i); end; end; drop i rc _TranAmt; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming your table is not big . data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; proc sql; create table want as select *, (select count(*) from new where CustId=a.CustId and TranDate between a.TranDate-7 and a.TranDate) as count from new as a; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This worked great on the sample dataset, but I my real set as 6 million rows, and it's been running for 2 hours, which is just too long. In addition to this, I'll need to calcualate the sum of the TranAmt field, and I can already see the requests coming to do this for number of trans (and sum of trans) in 4 days, 8 days, 30, days, 60 days. Would these need separate queries? Granted a new methodology might be needed for this large of a dataset. Thanks again for your prompt response, for a guy learning SAS, and I look forward to any additional insights/solutions you might have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The simplest way is using Hash Table. And assuming you have big memory to hold all these six million obs. If that is not worked, tell me , I can use other skill. data new; input CustID TranAmt:dollar6. TranDate:mmddyy10.; format TranDate mmddyy10.; datalines; 111 $100 6/1/2014 111 $200 6/2/2014 111 $500 6/4/2014 111 $500 6/10/2014 111 $100 6/11/2014 222 $200 6/3/2014 222 $500 6/4/2014 222 $600 6/4/2014 222 $600 6/11/2014 222 $600 6/19/2014 222 $600 6/28/2014 222 $600 6/29/2014 ; run; data want; if _n_=1 then do; if 0 then set new(rename=(TranAmt=_TranAmt)); declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y'); h.definekey('CustID','TranDate'); h.definedata('_TranAmt'); h.definedone(); end; set new; count_8day=0; sum_8day=0; do i=TranDate-7 to TranDate; rc=h.find(key:CustId,key:i); do while(rc=0); count_8day+1; sum_8day+_TranAmt; rc=h.find_next(key:CustId,key:i); end; end; drop i rc _TranAmt; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So I'm back with another question. The HASH table example worked great, but I'm now working on a MUCH larger SAS Dataset about 60MM obs. The hash object keeps running out of memory and giving this message:
ERROR: Hash object added 26214384 items when memory failure occurred. FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase. ERROR: The SAS System stopped processing this step because of insufficient memory.
SAS is running on a Unix/SAS Grid environment. I've looked into setting a "memsize" setting, but not sure if (or how) to do it in this setting, or if it will work. If there's another methodology that will work I'm all ears. Thanks in advance for the assistance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is not clear what the date limit is. Here are two alternatives:
data new;
input CustID TranAmt:dollar6. TranDate:mmddyy10.;
datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
/* Trans counts relative to a fixed date */
%let fixedDate='11JUN2014'd;
/* Could be %let fixedDate=Today(); */
proc sql;
select custId, count(*) as nbTransIn8Days
from new
where trandate > intnx("DAY", &fixedDate., -8)
group by custId;
quit;
/* Trans counts relative to the date of the last transaction for
each custId */
proc sql;
select custId, count(*) as nbTransIn8Days
from
( select *
from new
group by custId
having tranDate > intnx("DAY", max(TranDate), -8)
)
group by custId;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content