BookmarkSubscribeRSS Feed
omerozsutcu
Calcite | Level 5

Hello,

 

I have a data that contains 5 years Cumulative DR. I want to forecast following 10 years using Weibull Distribution. I checked several articles and tried some solutions ( How to fit my data to Gamma,  Weibull and Lognomal distributions?,) but I could not find solution for DR's.

 I tried creating polynomial  trend but got feedback that I should be using Weibull Distribution. 

I have shared a part of my data. Any help greatly appreciated ?

 

 

 

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @omerozsutcu and welcome to the SAS Support Communities!

 

I'm not sure if this is common practice in your subject area (credit risk?), but if you want to fit the parameters of a function -- here: the cumulative distribution function of a Weibull distribution (if I interpret your "cumulative default rates" correctly) -- to given values, PROC NLIN is an option. In the code below I have put your sample data (see ID 1) and some made up data (ID 2, added just to demonstrate BY-group processing) into a DATA step, which is easier to work with than an attached Excel file.

 

/* Create sample data for demonstration */

data have;
input id yr cumdr;
cards;
1 1 0.003398
1 2 0.012876
1 3 0.027161
1 4 0.03876
1 5 0.047056
2 1 0.005
2 2 0.015
2 3 0.030
2 4 0.040
2 5 0.050
;

/* Fit Weibull distributions to the data */

ods output parameterestimates=est;
proc nlin data=have;
by id;
parms shape=1 scale=50;
model cumdr = cdf('weibull',yr,shape,scale);
output out=want predicted=weibull;
run;

proc print data=want;
by id;
id id;
run;

/* Use parameter estimates to predict future CUMDR values */

proc transpose data=est out=estw(drop=_:);
by id;
var estimate;
id parameter;
run;

data pred;
set estw;
do yr=6 to 15;
  pred_cumdr=cdf('weibull',yr,shape,scale);
  output;
end;
run;

proc print data=pred;
by id;
id id;
run;

The starting values (shape=1, scale=50) in the PARMS statement of the PROC NLIN step might need to be adapted to your data. If the distributions (assuming you have more than one "ID") are very different, you may even need to use individual starting values for the IDs (see PDATA= option of the PARMS statement).

 

Edit: Note that different parameterizations of the Weibull distribution exist. It's always good to check whether the formula used by SAS (see CDF Weibull Distribution Function) matches the formula you expect or whether you need to apply some transformation to the "shape" and "scale" values.

 

omerozsutcu
Calcite | Level 5
Dear Jade,

Thanks a lot for the answer. I understand the logic but I have some questions.

Is there any method that I do not require to set initial parameters?

Second, Even though I change the parameters, the estimates does not change significantly. For instance better rating customers may have worse DR in the long run than worse ratings. Can I solve this problem using Proc NLIN?
FreelanceReinh
Jade | Level 19

You're welcome.


@omerozsutcu wrote:
Is there any method that I do not require to set initial parameters?

The first thing that comes to my mind is Rick Wicklin's blog article "Use a grid search to find initial parameter values for regression models in SAS", which explains how to alleviate the requirement of specifying initial values.

 


Second, Even though I change the parameters, the estimates does not change significantly. For instance better rating customers may have worse DR in the long run than worse ratings. Can I solve this problem using Proc NLIN?

Maybe a more flexible model would provide a better fit, but I'm not sure if this is appropriate in your application. I know that there are a three-parameter and even a five-parameter Weibull distribution (see sections 42.5 and 42.8 in Evans, M., Hastings, N. and Peacock, B. (2000), Statistical Distributions, 3rd ed., John Wiley & Sons, New York). I have never used them, but their CDF formulas could be plugged into PROC NLIN in a similar way as shown earlier for the two-parameter Weibull distribution.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1208 views
  • 3 likes
  • 3 in conversation