Hello,
I am trying to SUM transactions based on a 7 day period looking for accounts with $2,000 or more in transactions and I am having difficulty with the code and have not found anything through google that would solve my problem.
I have several months of transactional data by account and by date that I need to check if at any time in a 7 day period the account had $2,000 or more in transactions. Accounts can have more than 1 transaction per day and transactions can occur on any day (including holidays). The 7 day period can start/end on any day of the week.
I was hoping to use two a DO loop to determine the start date and end date however I have been unsucessful and would appreciate any guidance.
Below is a sample layout of the data I have:
Acct Transaction date
1234 50.00 01/01/2012
1234 1500.00 01/03/2012
4275 500.00 01/07/2012
There are no negative numbers in the transaction amount field.
Any help woudl be appreciated as I have spent the afternoon working on this without sucess. I only need to output records that meet the requirements.
It almost feels like I need a do loop with a proc sql inside a do loop but I have been unable to find anyway to do so.
data have;
informat date mmddyy10.;
format date mmddyy10.;
input Acct Transaction date;
cards;
1234 50.00 01/01/2012
1234 20.00 01/02/2012
1234 50.00 01/02/2012
1234 1500.00 01/03/2012
1234 50.00 01/04/2012
1234 300.00 01/03/2012
1234 50.00 01/06/2012
1234 1500.00 01/07/2012
1234 50.00 01/08/2012
1234 600.00 01/09/2012
1234 50.00 01/10/2012
1234 100.00 01/10/2012
2234 50.00 01/01/2012
2234 10.00 01/02/2012
2234 5.00 01/02/2012
2234 10.00 01/03/2012
2234 50.00 01/04/2012
2234 700.00 01/03/2012
2234 50.00 01/06/2012
2234 200.00 01/07/2012
2234 50.00 01/08/2012
2234 100.00 01/09/2012
2234 50.00 01/10/2012
2234 1800.00 01/10/2012
;
proc sql;
create table temp as select distinct acct, date from have;
create table want as select a.acct,a.date,sum(Transaction) as total
from temp a ,have b
where a.acct=b.acct and 0<=a.date-b.date<7
group by a.acct,a.date
having total>2000;
quit;
If I had to do this quickly I would try something like:
create interval for each record. If I did this right you now have the 7 day periods containg the transation.
data intervals;
set <existing data>;
do i = (-6) to 0 by 1;
interval = catx(' - ',put((date+i),mmddyy10.), put((date +i+6),mmddyy10.));
output;
end;
run;
proc summary data= intervals nway;
class account interval;
var transaction;
output out= transsummary (where=(transsum ge 2000)) sum=transsum;
run;
proc print data=transsummary;
var account interval transsum;
run;
data have;
informat date mmddyy10.;
format date mmddyy10.;
input Acct Transaction date;
cards;
1234 50.00 01/01/2012
1234 20.00 01/02/2012
1234 50.00 01/02/2012
1234 1500.00 01/03/2012
1234 50.00 01/04/2012
1234 300.00 01/03/2012
1234 50.00 01/06/2012
1234 1500.00 01/07/2012
1234 50.00 01/08/2012
1234 600.00 01/09/2012
1234 50.00 01/10/2012
1234 100.00 01/10/2012
2234 50.00 01/01/2012
2234 10.00 01/02/2012
2234 5.00 01/02/2012
2234 10.00 01/03/2012
2234 50.00 01/04/2012
2234 700.00 01/03/2012
2234 50.00 01/06/2012
2234 200.00 01/07/2012
2234 50.00 01/08/2012
2234 100.00 01/09/2012
2234 50.00 01/10/2012
2234 1800.00 01/10/2012
;
proc sql;
create table temp as select distinct acct, date from have;
create table want as select a.acct,a.date,sum(Transaction) as total
from temp a ,have b
where a.acct=b.acct and 0<=a.date-b.date<7
group by a.acct,a.date
having total>2000;
quit;
How about:
data have; informat date mmddyy10.; format date mmddyy10.; input Acct Transaction date; cards; 1234 50.00 01/01/2012 1234 20.00 01/02/2012 1234 50.00 01/02/2012 1234 1500.00 01/03/2012 1234 50.00 01/04/2012 1234 300.00 01/03/2012 1234 50.00 01/06/2012 1234 1500.00 01/07/2012 1234 50.00 01/08/2012 1234 600.00 01/09/2012 1234 50.00 01/10/2012 1234 100.00 01/10/2012 2234 50.00 01/01/2012 2234 10.00 01/02/2012 2234 5.00 01/02/2012 2234 10.00 01/03/2012 2234 50.00 01/04/2012 2234 700.00 01/03/2012 2234 50.00 01/06/2012 2234 200.00 01/07/2012 2234 50.00 01/08/2012 2234 100.00 01/09/2012 2234 50.00 01/10/2012 2234 1800.00 01/10/2012 ; proc sql; create table want as select a.*,(select sum(Transaction) as sum from have as b where b.acct=a.acct and b.date between a.date-6 and a.date ) as sum from have as a ; create table want_acct as select distinct acct from want where sum gt 2000; quit;
Ksharp
Hi Ksharp,
Our output datasets are different. I wonder which one is the one OP wants?
I guess my code is right . Just Guess.
Well I guess this is more complex than it seems:
if you switch 'a' and 'b' in your date range comparison, you will get different results, I couldn't figure out why.
You can't switch 'a' and 'b' in date range comparison.
b.date is rolling date , a.date is current obs's date .You can't exchange them.
Ksharp
Brandon,
I have a different approach, which is a little more complex. However, it has the potential to be much faster since it requires only one pass through the data. The idea is this, all contained in one DATA step. Compute the total transactions for each ACCT/DATE. Track the last 7 such totals. For each date, look through the 7 totals, and see which fall within the right time period. Sum them up. Here is what the code could look like (untested):
data want (keep = acct date);
set have;
by acct date;
array _7amnts {0:6} amnt1-amnt7;
array _7dates {0:6} date1-date7;
retain amnt1-amnt7 date1-date7;
if first.acct then do _i_=0 to 6;
_7amnts{_i_}=.;
_7dates{_i_}=.;
end;
if first.date then total = amount;
else total + amount;
if last.date;
dates_tracked + 1;
_7amnts{mod(dates_tracked,7)} = total;
_7dates{mod(dates_tracked,7)} = date;
last7_total=0;
do _i_=0 to 6;
if (date-6 < _7dates{_i_} <= date) then last7_total + _7amnts{_i_};
end;
if last7_total > 2000;
run;
Hope this helps (and hope it works)!
OK, notice 2 things. I have made a few minor edits, so this final program should be it ... subject to the fact that I used AMOUNT as the variable name instead of TRANSACTION. Again, hope it helps.
Astounding,
I need to add proc sort; by acct date; to run your code.
data have;
informat date mmddyy10.;
format date mmddyy10.;
input Acct amount date;
cards;
1234 50.00 01/01/2012
1234 20.00 01/02/2012
1234 50.00 01/02/2012
1234 1500.00 01/03/2012
1234 50.00 01/04/2012
1234 300.00 01/03/2012
1234 50.00 01/06/2012
1234 1500.00 01/07/2012
1234 50.00 01/08/2012
1234 600.00 01/09/2012
1234 50.00 01/10/2012
1234 100.00 01/10/2012
2234 50.00 01/01/2012
2234 10.00 01/02/2012
2234 5.00 01/02/2012
2234 10.00 01/03/2012
2234 50.00 01/04/2012
2234 700.00 01/03/2012
2234 50.00 01/06/2012
2234 200.00 01/07/2012
2234 50.00 01/08/2012
2234 100.00 01/09/2012
2234 50.00 01/10/2012
2234 1800.00 01/10/2012
;
proc sort;
by acct date;
data want (keep = acct date);
set have;
by acct date;
array _7amnts {0:6} amnt1-amnt7;
array _7dates {0:6} date1-date7;
retain amnt1-amnt7 date1-date7;
if first.acct then do _i_=0 to 6;
_7amnts{_i_}=.;
_7dates{_i_}=.;
end;
if first.date then total = amount;
else total + amount;
if last.date;
dates_tracked + 1;
_7amnts{mod(dates_tracked,7)} = total;
_7dates{mod(dates_tracked,7)} = date;
last7_total=0;
do _i_=0 to 6;
if (date-6 < _7dates{_i_} <= date) then last7_total + _7amnts{_i_};
end;
if last7_total > 2000;
run;
proc print;run;
Linlin,
Yes, the data needs to be in order. I assumed it was, since the original post said the data was "transactional data by account and by date". But if sorting is required, my solution might take longer to run.
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.