“Trend Analysis” - SAS logic?
I am looking to identify a trend for client’s balances of a Bank whose historical data exist in the dataset for a calendar year of 12 months.
To analyze trend I would like to see how balances of clients increase, decrease or stay the same using the following business rule by creating 3 new variables.
1. /* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/
2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance */
3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance */
The VERY LARGE dataset looks the following with the 3 columns:
clientid Date Balances
1100020 07/02/2011 37897
1100020 07/03/2011 89789
1100345 07/02/2011 78989
1100768 08/04/2011 768687
<-and so on..................------>
/*multiple clients-so multiple client ids, their balances for the dates between the period Jan 2011 and Dec2012-so 12 months data*/
I tried a self join In proc sql with intnx and aggregate function however I didn’t get the desired results.
Without the time series your data will work as is but you'll have to change the variable names below to what you end up with. The column headings for the monthly values aren't valid SAS variable namea and you may either end up with _08_31_2012 or VAR1 by default depending on import method. In some respects if they come in as VAR1 through VAR12 the coding may be easier.
/* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/
trend = (VAR12 - VAR1)/ VAR1;
2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance */
Short_term_growth = (max( of var10-var12) - Var12)/ Var12;
3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance */
long_term_growth = (max( of var1-var12) - Var12)/ Var12;
growth = VAR12 - mean(of var1-var12);
And if you sort by client ID you could reshape the data as recommended by ETS_KPS above and run a simple linear regression on model of value = date and look at the slope parameter and statistics which might be telling if there is a big change in balance between the last and next-to-last month. Consider what happens if months 2 to 11 show and average of 100 increase in balance but the month 12 balance is roughly the same as the month 1 balance.
1. Do you want to do this for one client, a few selected clients, or all of the clients in your file?
2. Is it a rolling monthly summary that you want (i.e. end of Jan, end of Feb, ...), for all of the months of the year, or is it just for annual periods? If the latter, what is the month of the end of the period?
3. How large is very large? tens of millions, hundreds of millions, billions?
4. What storage technology / database is your source data in? Reason for asking is that in requests like this, the ability to push processing into a database can DRASTICALLY improve speed.
Tom
Hi, did you ever get this figured out? I am doing something similar and would like some guidance.
Seth
Same questions that I asked Nav_Sweeney.
Tom
Hi Tom,
Thanks for reply.
All the clients in the file, the flat file is less that 50,000 records currently in excel. Annual periods. This is what my data looks like:
ID | 08_31_2012 | 09_30_2012 | 10_31_2012 | 11_30_2012 | 12_31_2012 | 01_31_2013 | 02_28_2013 | 03_31_2013 | 04_30_2013 | 05_31_2013 | 06_30_2013 | 07_31_2013 |
410764 | 358 | 317 | 327 | 167 | 167 | 379 | 336 | 349 | 363 | 369 | 327 | 348 |
89254 | 303 | 261 | 303 | 98 | 98 | 279 | 227 | 272 | 275 | 274 | 291 | 317 |
65320 | 233 | 211 | 234 | 101 | 101 | 280 | 250 | 264 | 267 | 304 | 263 | 302 |
163328 | 218 | 188 | 227 | 98 | 98 | 249 | 230 | 227 | 251 | 246 | 238 | 277 |
36919 | 174 | 198 | 243 | 80 | 80 | 218 | 203 | 210 | 236 | 211 | 197 | 232 |
28341 | 78 | 159 | 207 | 61 | 61 | 177 | 171 | 183 | 191 | 179 | 180 | 204 |
448691 | 32 | 32 | 56 | 60 | 83 | 179 | 192 |
Thanks!
Seth
Well, the first thing you need to do is reshape your data from WIDE to LONG with PROC TRANSPOSE.
proc transpose data=wide out=long;
by date;
run;
(the first post doesn't need this)
Make sure your date variable is formatted as a date and then you can use the following code to accomplish the 12 month percentage change. It uses a new ETS procedure called TIMEDATA (h/t to the TIMEDATA developers for this). The other operations will be very similar.
Just a little background on TIMEDATA... It is extremely efficient at the types of time series operations that you are trying to do.
data verylarge; set sashelp.air;
clientid=1;
balances=air;
run;
proc timedata data=VERYLARGE out=_NULL_ outarray=trend plot=arrays;
by clientid;
id date interval=month accumulate=average;
var balances;
outarray trend;
do t=1 to _LENGTH_;
if t < _SEASONALITY_ then trend
else do;
previous = balances[t-_SEASONALITY_];
recent = balances
if previous = 0 then trend
else trend
end;
end;
run;
Hi, unfortunately we are not licensed to use Time Series Analysis.
Seth
Do you need/want the same three calculations as the original post or something else to determine "trend".
Hi, Yes the first 3 would be good... but I also think ((latest month) minus (Average of first month through latest month)) would be a good demonstration of growth/loss.
Without the time series your data will work as is but you'll have to change the variable names below to what you end up with. The column headings for the monthly values aren't valid SAS variable namea and you may either end up with _08_31_2012 or VAR1 by default depending on import method. In some respects if they come in as VAR1 through VAR12 the coding may be easier.
/* trend = (Average of 12th Month Balance - Average of 1st month Balance)/Average of 1st Month Balance*/
trend = (VAR12 - VAR1)/ VAR1;
2. /*Short_term_growth= (Maximum Balance over last 3 months - Latest Balance)/latest balance */
Short_term_growth = (max( of var10-var12) - Var12)/ Var12;
3 /*Long_term_growth= (Maximum Balance over 12 Months – Latest Balance)/Latest Balance */
long_term_growth = (max( of var1-var12) - Var12)/ Var12;
growth = VAR12 - mean(of var1-var12);
And if you sort by client ID you could reshape the data as recommended by ETS_KPS above and run a simple linear regression on model of value = date and look at the slope parameter and statistics which might be telling if there is a big change in balance between the last and next-to-last month. Consider what happens if months 2 to 11 show and average of 100 increase in balance but the month 12 balance is roughly the same as the month 1 balance.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.