DATA Step, Macro, Functions and more

Count/Total Transactions in Last X Days

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Count/Total Transactions in Last X Days

I posted this question awhile back and received a great solution via a Hash Object via user Ksharp

data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(dataset:'new(rename=(TranAmt=_TranAmt))',hashexp:20,multidata:'y');
  h.definekey('CustID','TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;
set new;
count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:CustId,key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:CustId,key:i);
 end;
end;
drop i rc _TranAmt;
run;

This works, for "small" datasets, but now I'm working with datasets in the 40-60 million observation range, and the Hash Object solution is throwing an error that's running out of memory.  I've posted the HASH solution below for reference.

 

The problem I'm trying to solve is count the number of transactions that occured in the last X days.  The best example is saying if on 6/11/16, I want to know how many transactions happened in the prior 8 days, so including 6/11, that would be transactions from 6/11 - 6/4, or 3 total transactions.

 

Any suggestions on how to adapt this for much larger datasets?  Thank you.


Accepted Solutions
Solution
‎09-28-2017 09:53 AM
Super User
Posts: 10,611

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

@Haikuo is right. Hold one group value each time  can save a lot more memory .

 

 

data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(hashexp:20,multidata:'y');
  h.definekey('TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;

do until(last.CustID);
set new;
by CustID;
 _TranAmt=TranAmt;h.add();
end;


do until(last.CustID);
set new;
by CustID;

count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:i);
 end;
end;

output;
end;

h.clear();
drop i rc _TranAmt;
run;

View solution in original post


All Replies
Super User
Posts: 13,023

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

Just count?

Used 18Jun2014 as an arbitrary day since NONE of your example data would have been returned for anything related to todays date unless you were looking for 1000 days or so.

 

proc summary data=new nway;
   where trandate ge ('18Jun2014'd - 8);
   class custid;
   var tranamt;
   output out=want (drop= _:) n=TranCount;
run;

 

 

Assumed that you meant transaction by custid.

Occasional Contributor
Posts: 12

Re: Count/Total Transactions in Last X Days

I'm not in front of my machine with SAS loaded on it, so I can't test... but I don't believe this will work since it has a hard-coded date.  I'm basically trying to do a "rolling" look back 8 days on each record.  I didn't realize I didn't include a required output dataset.  It's listed below.

 

THank you for your prompt responses.

 

CustID

TranAmtTranDateEXPECTED
1111001-Jun-141
1112002-Jun-142
1115004-Jun-143
11150010-Jun-142
11110011-Jun-143
2222003-Jun-141
2225004-Jun-142
2226004-Jun-143
22260011-Jun-143
22260019-Jun-141
22260028-Jun-141
22260029-Jun-142
Valued Guide
Posts: 559

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

When you use hash objects ALL the data loaded into the hash has to be loaded into memory and it looks like you’ve hit the limit in your environment. There’s no way of guaranteeing you won’t ever run out of memory so if you are using colossal data sets you’re better off using an alternative method as @ballardw suggests.

Trusted Advisor
Posts: 1,284

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

Although your example doesn't clarify the issue, I presume you want rolling counts and sums.

 

Now if your data is sorted by custid, you could revise your program to process the data one custid at a time, by use of two "do until last.custid" loops containing a SET statement.  The first loop would build the hash object one item at a time, and the second would do pretty much what you're doing now.  At the end of the second loop, clear the hash object (rc=h.clear()) to recover memory.

 

But this would be an expensive proposition - adding items to object h one at a time instead of as a sort of "batch load".

 

You'll be much better off with temporary arrays (indexed on dates), one for date-specific total count, and one for date-specific total transamt:

 

%let lobound=%eval(%sysfunc(inputn(01jan2012,date9.))-7);
%let upbound=%sysfunc(inputn(31dec2016,date9.));

data want (keep=custid trandate tranamt count_8day sum_8day);

  array trcnt{&lobound:&upbound} _temporary_;
  array tramt{&lobound:&upbound} _temporary_;
  call missing(of tramt{*},of trcnt{*});
  do until (last.custid);
    set new;
    by custid;
    trcnt{trandate}+1;
    tramt{trandate}+tranamt;
  end;

  do until (last.custid);
    set new;
    by custid;
    count_8day=0;
    sum_8day=0;
    do i=0 to 7;
      count_8day=sum(count_8day,trcnt{trandate-i});
      sum_8day=sum(sum_8day,tramt{trandate-i});
    end;
    output;
  end;
run;

 

 

 

  1. Arrays can be indexed like this: array x{20:40} meaning the lower bound of array X is 20 and the upper bound is 40.  Since dates are whole numbers, if you know the minimum and maximum possible dates for your data you can use those as the bounds, and you could use TRANDATE as the array index.  Just be sure to move the lower bound down by an extra 7 so that even the lowest incoming date won't go out of range looking at the previous week's history.

  2. Unfortunately you can't declare an array with date literals as bounds.  I.e. you can't use:  

       array mydata{'01jan2012'd:'31dec2016'd};

    So you have to ask the macro processor to get that value.  That's why I have  

       %let lobound=%eval(%sysfunc(inputn(01jan2012,date9.))-7);

       %let upbound=%sysfunc(inputn(31dec2016,date9.));

    and

       array trcnt{&lobound:&upbound} _temporary_;

       array tramt{&lobound:&upbound} _temporary_;

  3. Because this recycles the same array for each custid, it uses a lot less memory than the hash object you are using.  Also it's faster than the hash object, since the array is key-indexed.  (the hash object might end up being faster if you were retrieving a lot more variables in definedata method).
Occasional Contributor
Posts: 12

Re: Count/Total Transactions in Last X Days

Thanks for the thorough explanation. Would using Tran_dt as the array index be a problem if there are multiple transactions on the same date for the same cust_id. Ex... cust 111 has 6 obs, all dated 1/1/2016?
Trusted Advisor
Posts: 1,284

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

You appear to want the total count and total tranamt for each date, so notice that the first "do until" group uses:

    trcnt{trandate}+1;

    tramt{trandate}+tranamt;

instead of

    trcnt{trandate}=1;

    tramt{trandate}=tranamt;

resulting in one array element per date with accumluated values for that date.

 

Yes, your hash object would have two items if a given date has two records.  But when you retrieve those data items, you accumulate the results.  This just accumulates data prior to retrieval.

 

It would be a problem if you needed something other than accumulations within a date.

Respected Advisor
Posts: 3,162

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007
data new;
	input CustID TranAmt:dollar6. TranDate:mmddyy10.;
	format TranDate mmddyy10.;
	datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

/*hwo about only loading one custid at a time, following tweak on Ksharp's code should circumvent or at least alleviate ram stress*/
data want_hash;
	if _n_=1 then
		do;
			declare hash h(multidata:'y', ordered:'y');
			h.definekey('TranDate');
			h.definedata('TranDate','_TranAmt');
			h.definedone();

			call missing (_TranAmt);
		end;

	do until (last.custid);
		set new;
		by CustID TranDate;

		if first.Custid then
			rc=h.clear();
		rc=h.add(key:trandate,data:trandate, data:tranamt);
	end;

	do until (last.custid);
		set new;
		by CustID TranDate;
		sum_8day=0;
		count_8day=0;

		do i=TranDate-7 to TranDate;
			rc=h.find(key:i);

			do while(rc=0);
				sum_8day+_TranAmt;
				count_8day+1;
				rc=h.find_next(key:i);
			end;
		end;

		output;
	end;
	drop i rc _TranAmt;
run;

/*this will slow, but a lot easier to code and read*/
proc sql;
	create table want_sql as
		select *, (select sum(tranamt) from new a where a.custid=b.custid and  b.trandate-a.trandate between 0 and 7) as sum_8days,
(select count(*) from new a where a.custid=b.custid and  b.trandate-a.trandate between 0 and 7) as count_8days
from new b
	;
quit;
Solution
‎09-28-2017 09:53 AM
Super User
Posts: 10,611

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

@Haikuo is right. Hold one group value each time  can save a lot more memory .

 

 

data new;
  input CustID TranAmt:dollar6. TranDate:mmddyy10.;
  format TranDate mmddyy10.;
  datalines;
111 $100 6/1/2014
111 $200 6/2/2014
111 $500 6/4/2014
111 $500 6/10/2014
111 $100 6/11/2014
222 $200 6/3/2014
222 $500 6/4/2014
222 $600 6/4/2014
222 $600 6/11/2014
222 $600 6/19/2014
222 $600 6/28/2014
222 $600 6/29/2014
;
run;

data want;
 if _n_=1 then do;
  if 0 then set new(rename=(TranAmt=_TranAmt));
  declare hash h(hashexp:20,multidata:'y');
  h.definekey('TranDate');
  h.definedata('_TranAmt');
  h.definedone();
 end;

do until(last.CustID);
set new;
by CustID;
 _TranAmt=TranAmt;h.add();
end;


do until(last.CustID);
set new;
by CustID;

count_8day=0;
sum_8day=0;
do i=TranDate-7 to TranDate;
 rc=h.find(key:i);
 do while(rc=0);
  count_8day+1;
  sum_8day+_TranAmt;
  rc=h.find_next(key:i);
 end;
end;

output;
end;

h.clear();
drop i rc _TranAmt;
run;
Occasional Contributor
Posts: 12

Re: Count/Total Transactions in Last X Days

@Ksharp thanks again for the tip! It took 13 hours on 40MM obs, but it ran and looks good. I can't thank you and the other posters, @Haikuo and @mkeintz.
Trusted Advisor
Posts: 1,284

Re: Count/Total Transactions in Last X Days

Posted in reply to Fenton007

13 hours?   If you have to do this again, I'd strongly suggest using the array solution, which should be noticably faster.  In the program code below processing 100,000,000 skinny observaions, the T2 dataset (using arrays) took about half the time to create as the T1 (using hash).  About 84 seconds vs 44 on our server.

 

hash is more flexible, but when it comes to looking up single values, it is intrinsically slower than arrays, since the latter has a key pointing directly to the array location of data value, whereas hash needs to process the key value before locating the data value.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 264 views
  • 1 like
  • 6 in conversation