SAS Programming

Fenton007 · Posted 08-26-2016 08:49 PM

First timer here, please be gentle!

I've included some code below as well as actual and expected results. My code is clearly not working and it was dumb luck I thought it was. I'm trying to count the number of transactions that occured in the last 8 days. The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions, where the code right now is showing 5.

I'd appreciate any possible solution you have to this problem, my back's against the wall trying to correct this. Thank you.

Jon

data new;

input CustID TranAmt:dollar6. TranDate:mmddyy10.;

datalines;

111 $100 6/1/2014

111 $200 6/2/2014

111 $500 6/4/2014

111 $500 6/10/2014

111 $100 6/11/2014

222 $200 6/3/2014

222 $500 6/4/2014

222 $600 6/4/2014

222 $600 6/11/2014

222 $600 6/19/2014

222 $600 6/28/2014

222 $600 6/29/2014;

run;

proc print;

run;

proc sort data=new;

by CustID TranDate;

run;

data new2;

set new;

by CustID;

Prev=lag1(TranDate);

if first.CustID=1 then TranCountLast8Days=1;

else if TranDate<prev+8 then TranCountLast8Days+1;

else TranCountLast8Days=1;

format TranDate Prev date9.;

run;

CustID	TranAmt	TranDate	Prev	TranCountLast8Days	EXPECTED
111	100	1-Jun-14	.	1	1
111	200	2-Jun-14	1-Jun-14	2	2
111	500	4-Jun-14	2-Jun-14	3	3
111	500	10-Jun-14	4-Jun-14	4	2
111	100	11-Jun-14	10-Jun-14	5	3
222	200	3-Jun-14	11-Jun-14	1	1
222	500	4-Jun-14	3-Jun-14	2	2
222	600	4-Jun-14	4-Jun-14	3	3
222	600	11-Jun-14	4-Jun-14	4	3
222	600	19-Jun-14	11-Jun-14	1	1
222	600	28-Jun-14	19-Jun-14	1	1
222	600	29-Jun-14	28-Jun-14	2	2

Ksharp · Posted 08-27-2016 10:11 PM

The simplest way is using Hash Table.
And assuming you have big memory to hold all these six million obs.
If that is not worked, tell me , I can use other skill.



data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
  h.definekey('CustID','TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:CustId,key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:CustId,key:i);
 end;
end;
drop i rc _TranAmt;
run;

View solution in original post

Ksharp · Posted 08-26-2016 09:38 PM

Assuming your table is not big .


data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;
proc sql;
create table want as
 select *,
 (select count(*) from new 
  where CustId=a.CustId 
   and TranDate between a.TranDate-7 and a.TranDate) as count
from new as a;
quit;

Fenton007 · Posted 08-27-2016 08:18 AM

Thanks. I'm away from my machine it wanted to add clarification on volume. There could be 3 million rows I would need to run this on.

Fenton007 · Posted 08-27-2016 12:52 PM

This worked great on the sample dataset, but I my real set as 6 million rows, and it's been running for 2 hours, which is just too long. In addition to this, I'll need to calcualate the sum of the TranAmt field, and I can already see the requests coming to do this for number of trans (and sum of trans) in 4 days, 8 days, 30, days, 60 days. Would these need separate queries? Granted a new methodology might be needed for this large of a dataset. Thanks again for your prompt response, for a guy learning SAS, and I look forward to any additional insights/solutions you might have.

Ksharp · Posted 08-27-2016 10:11 PM

The simplest way is using Hash Table.
And assuming you have big memory to hold all these six million obs.
If that is not worked, tell me , I can use other skill.



data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
  h.definekey('CustID','TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:CustId,key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:CustId,key:i);
 end;
end;
drop i rc _TranAmt;
run;

Fenton007 · Posted 09-22-2017 10:08 AM

So I'm back with another question. The HASH table example worked great, but I'm now working on a MUCH larger SAS Dataset about 60MM obs. The hash object keeps running out of memory and giving this message:

ERROR: Hash object added 26214384 items when memory failure occurred. FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase. ERROR: The SAS System stopped processing this step because of insufficient memory.

SAS is running on a Unix/SAS Grid environment. I've looked into setting a "memsize" setting, but not sure if (or how) to do it in this setting, or if it will work. If there's another methodology that will work I'm all ears. Thanks in advance for the assistance.

PGStats · Posted 08-26-2016 10:43 PM

It is not clear what the date limit is. Here are two alternatives:

data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;

/* Trans counts relative to a fixed date */

%let fixedDate='11JUN2014'd;
/* Could be %let fixedDate=Today(); */

proc sql;
select custId, count(*) as nbTransIn8Days
from new
where trandate > intnx("DAY", &fixedDate., -8)
group by custId;
quit;

/* Trans counts relative to the date of the last transaction for
each custId */

proc sql;
select custId, count(*) as nbTransIn8Days
from 
    (   select * 
        from new
        group by custId
        having tranDate > intnx("DAY", max(TranDate), -8)
    )
group by custId;
quit;

PG

Fenton007 · Posted 08-31-2016 10:52 AM

Worked like a charm. I can't thank you enough!!!

SAS Programming

Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Re: Count Transactions in last 8 days

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...