BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nav_Sweeney
Calcite | Level 5

“Trend Analysis” - SAS logic?

I am looking to identify a trend for client’s balances of a Bank whose historical data exist in the dataset for a calendar year of 12 months.

To analyze trend I would like to see how balances of clients increase, decrease or stay the same using the following business rule by creating 3 new variables.

1. /* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/

2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance   */

3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance   */

The VERY LARGE dataset looks the following with the 3 columns:

clientid           Date                     Balances

1100020       07/02/2011            37897

1100020       07/03/2011             89789

1100345        07/02/2011            78989

1100768        08/04/2011            768687

<-and so on..................------>

/*multiple clients-so multiple client ids, their balances for the dates between the period Jan 2011 and Dec2012-so 12 months data*/

I tried a self join In proc sql with intnx and aggregate function however I didn’t get the desired results.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Without the time series your data will work as is but you'll have to change the variable names below to what you end up with. The column headings for the monthly values aren't valid SAS variable namea and you may either end up with _08_31_2012 or VAR1 by default depending on import method. In some respects if they come in as VAR1 through VAR12 the coding may be easier.

/* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/

trend = (VAR12 - VAR1)/ VAR1;

2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance   */

Short_term_growth = (max( of var10-var12) - Var12)/ Var12;

3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance   */

long_term_growth = (max( of var1-var12) - Var12)/ Var12;

growth = VAR12 - mean(of var1-var12);

And if you sort by client ID you could reshape the data as recommended by ETS_KPS above and run a simple linear regression on model of value = date and look at the slope parameter and statistics which might be telling if there is a big change in balance between the last and next-to-last month. Consider what happens if months 2 to 11 show and average of 100 increase in balance but the month 12 balance is roughly the same as the month 1 balance.


View solution in original post

9 REPLIES 9
TomKari
Onyx | Level 15

1. Do you want to do this for one client, a few selected clients, or all of the clients in your file?

2. Is it a rolling monthly summary that you want (i.e. end of Jan, end of Feb, ...), for all of the months of the year, or is it just for annual periods? If the latter, what is the month of the end of the period?

3. How large is very large? tens of millions, hundreds of millions, billions?

4. What storage technology / database is your source data in? Reason for asking is that in requests like this, the ability to push processing into a database can DRASTICALLY improve speed.

Tom

eyespike1
Calcite | Level 5

Hi, did you ever get this figured out? I am doing something similar and would like some guidance.

Seth

TomKari
Onyx | Level 15

Same questions that I asked Nav_Sweeney.

Tom

eyespike1
Calcite | Level 5

Hi Tom,

Thanks for reply.

All the clients in the file, the flat file is less that 50,000 records currently in excel. Annual periods. This is what my data looks like:

ID08_31_201209_30_201210_31_201211_30_201212_31_201201_31_201302_28_201303_31_201304_30_201305_31_201306_30_201307_31_2013
410764358317327167167379336349363369327348
892543032613039898279227272275274291317
65320233211234101101280250264267304263302
1633282181882279898249230227251246238277
369191741982438080218203210236211197232
28341781592076161177171183191179180204
4486913232566083179192

Thanks!

Seth

ets_kps
SAS Employee


Well, the first thing you need to do is reshape your data from WIDE to LONG with PROC TRANSPOSE.

proc transpose data=wide out=long;
  by date;
run;

(the first post doesn't need this)

Make sure your date variable is formatted as a date and then you can use the following code to accomplish the 12 month percentage change. It uses a new ETS procedure called TIMEDATA (h/t to the TIMEDATA developers for this). The other operations will be very similar.

Just a little background on TIMEDATA... It is extremely efficient at the types of time series operations that you are trying to do.

data verylarge; set sashelp.air;

clientid=1;

balances=air;

run;

proc timedata data=VERYLARGE out=_NULL_ outarray=trend plot=arrays;

by clientid;

id date interval=month accumulate=average;

var balances;

outarray trend;

do t=1 to _LENGTH_;

     if t < _SEASONALITY_ then trend = .;

           else do;

          previous = balances[t-_SEASONALITY_];

          recent   = balances;

               if previous = 0 then trend = .;

                                 else trend = (recent - previous) / previous;

          end;

      end;

run;

eyespike1
Calcite | Level 5

Hi, unfortunately we are not licensed to use Time Series Analysis.

Seth

ballardw
Super User

Do you need/want the same three calculations as the original post or something else to determine "trend".


eyespike1
Calcite | Level 5

Hi, Yes the first 3 would be good... but I also think ((latest month) minus (Average of first month through latest month)) would be a good demonstration of growth/loss.

ballardw
Super User

Without the time series your data will work as is but you'll have to change the variable names below to what you end up with. The column headings for the monthly values aren't valid SAS variable namea and you may either end up with _08_31_2012 or VAR1 by default depending on import method. In some respects if they come in as VAR1 through VAR12 the coding may be easier.

/* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/

trend = (VAR12 - VAR1)/ VAR1;

2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance   */

Short_term_growth = (max( of var10-var12) - Var12)/ Var12;

3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance   */

long_term_growth = (max( of var1-var12) - Var12)/ Var12;

growth = VAR12 - mean(of var1-var12);

And if you sort by client ID you could reshape the data as recommended by ETS_KPS above and run a simple linear regression on model of value = date and look at the slope parameter and statistics which might be telling if there is a big change in balance between the last and next-to-last month. Consider what happens if months 2 to 11 show and average of 100 increase in balance but the month 12 balance is roughly the same as the month 1 balance.


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3693 views
  • 0 likes
  • 5 in conversation