Help using Base SAS procedures

Running sum in a transactions table

Accepted Solution Solved
Reply
N/A
Posts: 1
Accepted Solution

Running sum in a transactions table

I have a list of transactions, each with an associated date and account ID. Does anyone know of a simple way of determining if an account has accumulated a threshold number of dollars in a date range (ex. $X in any twelve-month period over the past ten years)?

For instance, I might need to know if an account gave $5000 in any twelve-month period in the past ten years. It sounds very simple, but I cannot find a way to achieve this without either creating a table for each day of the ten-year period or for creating a variable for each transaction in an account's transactional history (which would also yield a very large number of variables).

Thanks,

Mike


Accepted Solutions
Solution
‎01-28-2013 11:29 AM
Super Contributor
Posts: 578

Re: Running sum in a transactions table

data have;

input tx_id tx_date account_id amt;

informat tx_date date9.;

format tx_date date9.;

cards;

1 01jan2012 101 1000

2 02jan2012 101 1000

3 03jan2012 101 1000

4 04jan2012 101 1000

5 05jan2012 101 1000

6 06jan2012 101 1000

7 07jan2012 101 1000

8 08jan2012 101 1000

9 09jan2012 101 1000

10 10jan2012 101 1000

11 01jan2012 102 5000

12 01jan2012 103 100

;

run;

proc sql;

select distinct account_id from

(select

    t1.account_id

    ,t1.tx_date   

from

    work.have t1

    left outer join work.have t2

        on t1.account_id=t2.account_id

            and t1.tx_date <= t2.tx_date

            and intnx('year',t1.tx_date,1)>=t2.tx_date

group by t1.account_id, t1.tx_date

having sum(t2.amt)>=5000) ;

quit;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Running sum in a transactions table

If the dates are SAS date values this may not be too difficult.

Proc format;

value tdate

/* this is where the work goes on by specifying date ranges of interest*/

'01APR2000'd   -    '31MAR2001'd

'01APR2001'd   -    '31MAR2002'd

'01APR2002'd   -    '31MAR2003'd

'01APR2003'd   -    '31MAR2004'd

'01APR2004'd   -    '31MAR2005'd

'01APR2005'd   -    '31MAR2006'd

'01APR2006'd   -    '31MAR2007'd  /* continue as needed to span your time range*/

;

run;

I would probably build a variety of permanent formats maybe 12 such as tdateJan tdateFeb tdateMar tdateApr (above)

and then change the format in the proc summary below.

proc summary data=yourdata nway;

    

     class AccountId TransActionDate;

     format TransactionDate Tdate.;

     var TransactionValue;

     output out=TransactionSum (where=(TransactionSum ge 5000)) sum=TransactionSum;

run

PROC Star
Posts: 7,474

Re: Running sum in a transactions table

Mike,

I couldn't write it without a sample dataset.  And, are you looking for individual transactions during the twelve month periods or a sum of the transactions within a month, or something else altogether?

If it is the sum of transactions, I would use proc summary to create a file that has the monthly sums for each id for each month of the ten-year period i.e., include a format that converted dates to monyy7..

Then, I'd use a datastep that only kept the ID number with (1) a 120 cell array to capture all of the months; (2) a loop that went through each of the possible 12 month periods and, in that loop, build a 12 cell array to capture the 12 months and use the sum function to determine if the value was ge 5000.

If that criterion is reached, output the ID number, exit the loop and begin the process over with the next id's information..

Solution
‎01-28-2013 11:29 AM
Super Contributor
Posts: 578

Re: Running sum in a transactions table

data have;

input tx_id tx_date account_id amt;

informat tx_date date9.;

format tx_date date9.;

cards;

1 01jan2012 101 1000

2 02jan2012 101 1000

3 03jan2012 101 1000

4 04jan2012 101 1000

5 05jan2012 101 1000

6 06jan2012 101 1000

7 07jan2012 101 1000

8 08jan2012 101 1000

9 09jan2012 101 1000

10 10jan2012 101 1000

11 01jan2012 102 5000

12 01jan2012 103 100

;

run;

proc sql;

select distinct account_id from

(select

    t1.account_id

    ,t1.tx_date   

from

    work.have t1

    left outer join work.have t2

        on t1.account_id=t2.account_id

            and t1.tx_date <= t2.tx_date

            and intnx('year',t1.tx_date,1)>=t2.tx_date

group by t1.account_id, t1.tx_date

having sum(t2.amt)>=5000) ;

quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 517 views
  • 6 likes
  • 4 in conversation