BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fenton007
Fluorite | Level 6

 

First timer here, please be gentle!

 

I've included some code below as well as actual and expected results.  My code is clearly not working and it was dumb luck I thought it was.  I'm trying to count the number of transactions that occured in the last 8 days.  The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions, where the code right now is showing 5.

 

I'd appreciate any possible solution you have to this problem, my back's against the wall trying to correct this.  Thank you.

 

Jon 

 

data new;

  input CustID TranAmt:dollar6. TranDate:mmddyy10.;

  datalines;

111 $100 6/1/2014

111 $200 6/2/2014

111 $500 6/4/2014

111 $500 6/10/2014

111 $100 6/11/2014

222 $200 6/3/2014

222 $500 6/4/2014

222 $600 6/4/2014

222 $600 6/11/2014

222 $600 6/19/2014

222 $600 6/28/2014

222 $600 6/29/2014;

run;

 

proc print;

run;

 

proc sort data=new;

  by CustID TranDate;

run;

 

data new2;

  set new;

  by CustID;

  Prev=lag1(TranDate);

  if first.CustID=1 then TranCountLast8Days=1;

  else if TranDate<prev+8 then TranCountLast8Days+1;

  else TranCountLast8Days=1;

  format TranDate Prev date9.;

run;

 

CustID

TranAmt

TranDate

Prev

TranCountLast8Days

EXPECTED

111

100

1-Jun-14

.

1

1

111

200

2-Jun-14

1-Jun-14

2

2

111

500

4-Jun-14

2-Jun-14

3

3

111

500

10-Jun-14

4-Jun-14

4

2

111

100

11-Jun-14

10-Jun-14

5

3

222

200

3-Jun-14

11-Jun-14

1

1

222

500

4-Jun-14

3-Jun-14

2

2

222

600

4-Jun-14

4-Jun-14

3

3

222

600

11-Jun-14

4-Jun-14

4

3

222

600

19-Jun-14

11-Jun-14

1

1

222

600

28-Jun-14

19-Jun-14

1

1

222

600

29-Jun-14

28-Jun-14

2

2

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
The simplest way is using Hash Table.
And assuming you have big memory to hold all these six million obs.
If that is not worked, tell me , I can use other skill.



data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
  h.definekey('CustID','TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:CustId,key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:CustId,key:i);
 end;
end;
drop i rc _TranAmt;
run;



View solution in original post

7 REPLIES 7
Ksharp
Super User
Assuming your table is not big .


data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;
proc sql;
create table want as
 select *,
 (select count(*) from new 
  where CustId=a.CustId 
   and TranDate between a.TranDate-7 and a.TranDate) as count
from new as a;
quit;


Fenton007
Fluorite | Level 6
Thanks. I'm away from my machine it wanted to add clarification on volume. There could be 3 million rows I would need to run this on.
Fenton007
Fluorite | Level 6

This worked great on the sample dataset, but I my real set as 6 million rows, and it's been running for 2 hours, which is just too long.  In addition to this, I'll need to calcualate the sum of the TranAmt field, and I can already see the requests coming to do this for number of trans (and sum of trans) in 4 days, 8 days, 30, days, 60 days.  Would these need separate queries?  Granted a new methodology might be needed for this large of a dataset.  Thanks again for your prompt response, for a guy learning SAS, and I look forward to any additional insights/solutions you might have.

Ksharp
Super User
The simplest way is using Hash Table.
And assuming you have big memory to hold all these six million obs.
If that is not worked, tell me , I can use other skill.



data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
  h.definekey('CustID','TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:CustId,key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:CustId,key:i);
 end;
end;
drop i rc _TranAmt;
run;



Fenton007
Fluorite | Level 6

So I'm back with another question.  The HASH table example worked great, but I'm now working on a MUCH larger SAS Dataset about 60MM obs.  The hash object keeps running out of memory and giving this message:

 

ERROR: Hash object added 26214384 items when memory failure occurred. FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase. ERROR: The SAS System stopped processing this step because of insufficient memory.

 

SAS is running on a Unix/SAS Grid environment.  I've looked into setting a "memsize" setting, but not sure if (or how) to do it in this setting, or if it will work.  If there's another methodology that will work I'm all ears.  Thanks in advance for the assistance.

PGStats
Opal | Level 21

It is not clear what the date limit is. Here are two alternatives:

 

data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;

/* Trans counts relative to a fixed date */

%let fixedDate='11JUN2014'd;
/* Could be %let fixedDate=Today(); */

proc sql;
select custId, count(*) as nbTransIn8Days
from new
where trandate > intnx("DAY", &fixedDate., -8)
group by custId;
quit;

/* Trans counts relative to the date of the last transaction for
each custId */

proc sql;
select custId, count(*) as nbTransIn8Days
from 
    (   select * 
        from new
        group by custId
        having tranDate > intnx("DAY", max(TranDate), -8)
    )
group by custId;
quit; 
PG
Fenton007
Fluorite | Level 6
Worked like a charm. I can't thank you enough!!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1628 views
  • 1 like
  • 3 in conversation