Re: bootstrap for default rates

Anna7 · Posted 12-06-2017 09:03 PM

Hello, I would like some help with this question:

The data contains S&P rating information for around 15,000 companies at the beginning as well as end of the year 2015.

PID: a serial number representing company ID;

Rating2015_B: S&P rating at the beginning of the year 2015;

Rating2015_E: S&P rating at the end of the year 2015, “D” means company defaults at the end of year

Use the nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level.

I have this code:

proc surveyselect data=Mylib.LoanData

method=SRS

n=100

reps=100

seed=9999

out=Mylib.LoanDataBootStrapCor;run;

proc corr data=Mylib.LoanDataBootStrapCor;

by replicate;var Rating_2015B;

ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

run;

data Mylib.BootStrapPearsonCorr1;

set Mylib.BootStrapPearsonCorr;

by replicate;

if first.replicate then delete;

run;

proc univariate data=Mylib.BootStrapPearsonCorr1 pctldef=4; var Rating_2015B;

output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; run;

proc print data=Mylib.BootStrapCorCI; run;

I would like to get some help and understand where I am going wrong and if this code is correct for this particular question.

When i run the code, i'm getting error as sas is not recognizing BootStrapPearsonCorr1 and BootStrapCorCI

Thank you!

Rick_SAS · Posted 12-08-2017 09:49 AM

I think you have the correct general idea. This is the basic framework I discuss in Chapter 15 ("Resampling and Bootstrap Methods") of my book Simulating Data with SAS.

1. Pearson correlation applies to numerical data. From the spreadsheet that you attached, you have categorical data. Did you mean to use PROC FREQ or some other procedure that handles categorical variables?

2. You only have one variable listed on the VAR statement in PROC CORR. The correlation of a variable with itself is always 1, so even if you had numerical data your output would contain a column of all 1s, which is probably not what you want. Options: include a WITH statement or specify more than one variable.

Anna7 · Posted 12-08-2017 10:02 AM

Hi, thanks for your reply. Yes with this dataset, I only have those 2 columns with the categorical data. Currently I have the transition matrix for the beginning of the year default rating and end of year default rating. After that, I am confused as to how the code should proceed to determine the migration relationship and to determine the confidence interval. I am not aware of how to write the code for those two parts.

Rick_SAS · Posted 12-08-2017 10:19 AM

OK. That's a different question. Convert the matrix you have to a transition matrix of probabilities. Then you can use a simulation of the Markov chain model in SAS/IML to compute the end-of-year expected value and CIs for each start-of-year rating.

For a discussion and example, see "Markov transition matrices in SAS/IML"

There is a bit of a mathematical wrinkle in this approach. You have discrete categories, so unless you assign an ordinal scale, the best you can do is discuss the expected mode. If you assign numbers for the ratings (AAA=1, AA+=2, AA=3,...) then you can get a numerical mean and CI. However, it is not clear whether the scale should be linear. I will leave that to the domain experts.

Anna7 · Posted 12-08-2017 01:47 PM

Thank you for your reply. I am at beginner level for SAS so I am not aware of a lot of the terminology and many features of SAS.

I am not really sure which route to take in order to solve this question, but I have attempted different things.

Previously it was this code to figure out the confidence interval, however I am unaware of what is going wrong and how to improve the code:

proc surveyselect data=Mylib.LoanData

method=SRS

n=100

reps=100

seed=9999

out=Mylib.LoanDataBootStrapCor;run;

proc corr data=Mylib.LoanDataBootStrapCor;

by replicate;var Rating_2015B;

ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

run;

data Mylib.BootStrapPearsonCorr1;

set Mylib.BootStrapPearsonCorr;

by replicate;

if first.replicate then delete;

run;

proc univariate data=Mylib.BootStrapPearsonCorr1 pctldef=4; var Rating_2015B;

output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; run;

proc print data=Mylib.BootStrapCorCI; run;

So then I tried the transition matrix and right now I have gotten this far.

This has the percentage for every default rate from the beginning of the year to the end.

But I am unaware of how to write the code for confidence interval and determining any relationship. The question asks to use:

nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level.

It would be great, if I can get some help in writing the code. I am not aware of the logic and what the code should be and the functions.

Rick_SAS · Posted 12-13-2017 03:14 PM

> I am not aware of the logic and what the code should be and the functions.

Let's start with the logic. Focus on the first cell (the other cells are similar.) The original data says that there were 1029 companies that started with A rating. At the end of the time period, 832 / 1029 = 0.8085 was the proportion that stayed A rated. You can compute this proportion for every rating category in the row. This is the empirical point estimate for the probability that a company that started A rated ends up in Rating[j]..

When you run a Monte Carlo simulation, you get a point estimate in each cell for each of the B simulations that you run. This gives you the sampling distribution for the transition probability. For example, you get B values in the first cell. They might be

0.83, 0.92, 0.85, 0.73, 0.79, etc.

The mean of those values is the Monte Carlo estimate of the mean. It might be 0.82. The interval [a,b] is an approximate 95% confidence interval where a is the 2.5th percentile and b is the 97.5th percentile. For each cell, you can use PROC UNIVARIATE to compute the mean and percentiles of the simulated estimates.

bootstrap for default rates