turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- bootstrap for default rates

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2017 09:03 PM

Hello, I would like some help with this question:

The data contains S&P rating information for around 15,000 companies at the beginning as well as end of the year 2015.

PID: a serial number representing company ID;

Rating2015_B: S&P rating at the beginning of the year 2015;

Rating2015_E: S&P rating at the end of the year 2015, “D” means company defaults at the end of year

Use the nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level.

I have this code:

**proc** **surveyselect** data=Mylib.LoanData

method=SRS

n=**100**

reps=**100**

seed=**9999**

out=Mylib.LoanDataBootStrapCor;**run**;

**proc** **corr** data=Mylib.LoanDataBootStrapCor;

by replicate;var Rating_2015B;

ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

**run**;

**data** Mylib.BootStrapPearsonCorr1;

set Mylib.BootStrapPearsonCorr;

by replicate;

if first.replicate then delete;

**run**;

**proc** **univariate** data=Mylib.BootStrapPearsonCorr1 pctldef=**4**; var Rating_2015B;

output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; **run**;

**proc** **print** data=Mylib.BootStrapCorCI; **run**;

I would like to get some help and understand where I am going wrong and if this code is correct for this particular question.

When i run the code, i'm getting error as sas is not recognizing BootStrapPearsonCorr1 and BootStrapCorCI

Thank you!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Anna7

12-08-2017 09:49 AM

I think you have the correct general idea. This is the basic framework I discuss in Chapter 15 ("Resampling and Bootstrap Methods") of my book Simulating Data with SAS.

1. Pearson correlation applies to numerical data. From the spreadsheet that you attached, you have categorical data. Did you mean to use PROC FREQ or some other procedure that handles categorical variables?

2. You only have one variable listed on the VAR statement in PROC CORR. The correlation of a variable with itself is always 1, so even if you had numerical data your output would contain a column of all 1s, which is probably not what you want. Options: include a WITH statement or specify more than one variable.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

12-08-2017 10:02 AM

Hi, thanks for your reply. Yes with this dataset, I only have those 2 columns with the categorical data. Currently I have the transition matrix for the beginning of the year default rating and end of year default rating. After that, I am confused as to how the code should proceed to determine the migration relationship and to determine the confidence interval. I am not aware of how to write the code for those two parts.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Anna7

12-08-2017 10:19 AM - edited 12-08-2017 10:35 AM

OK. That's a different question. Convert the matrix you have to a transition matrix of probabilities. Then you can use a simulation of the Markov chain model in SAS/IML to compute the end-of-year expected value and CIs for each start-of-year rating.

For a discussion and example, see "Markov transition matrices in SAS/IML"

There is a bit of a mathematical wrinkle in this approach. You have discrete categories, so unless you assign an ordinal scale, the best you can do is discuss the expected mode. If you assign numbers for the ratings (AAA=1, AA+=2, AA=3,...) then you can get a numerical mean and CI. However, it is not clear whether the scale should be linear. I will leave that to the domain experts.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

12-08-2017 01:47 PM

Thank you for your reply. I am at beginner level for SAS so I am not aware of a lot of the terminology and many features of SAS.

I am not really sure which route to take in order to solve this question, but I have attempted different things.

Previously it was this code to figure out the confidence interval, however I am unaware of what is going wrong and how to improve the code:

**proc** **surveyselect** data=Mylib.LoanData

method=SRS

n=**100**

reps=**100**

seed=**9999**

out=Mylib.LoanDataBootStrapCor;**run**;

**proc** **corr** data=Mylib.LoanDataBootStrapCor;

by replicate;var Rating_2015B;

ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

**run**;

**data** Mylib.BootStrapPearsonCorr1;

set Mylib.BootStrapPearsonCorr;

by replicate;

if first.replicate then delete;

**run**;

**proc** **univariate** data=Mylib.BootStrapPearsonCorr1 pctldef=**4**; var Rating_2015B;

output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; **run**;

**proc** **print** data=Mylib.BootStrapCorCI; **run**;

So then I tried the transition matrix and right now I have gotten this far.

This has the percentage for every default rate from the beginning of the year to the end.

But I am unaware of how to write the code for confidence interval and determining any relationship. The question asks to use:

nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level.

It would be great, if I can get some help in writing the code. I am not aware of the logic and what the code should be and the functions.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Anna7

12-13-2017 03:14 PM

*> I am not aware of the logic and what the code should be and the functions.*

Let's start with the logic. Focus on the first cell (the other cells are similar.) The original data says that there were 1029 companies that started with A rating. At the end of the time period, 832 / 1029 = 0.8085 was the proportion that stayed A rated. You can compute this proportion for every rating category in the row. This is the empirical point estimate for the probability that a company that started A rated ends up in Rating[j]..

When you run a Monte Carlo simulation, you get a point estimate in each cell for **each** of the B simulations that you run. This gives you the sampling distribution for the transition probability. For example, you get B values in the first cell. They might be

0.83, 0.92, 0.85, 0.73, 0.79, etc.

The mean of those values is the Monte Carlo estimate of the mean. It might be 0.82. The interval [*a,b*] is an approximate 95% confidence interval where *a* is the 2.5th percentile and *b *is the 97.5th percentile. For each cell, you can use PROC UNIVARIATE to compute the mean and percentiles of the simulated estimates.