BookmarkSubscribeRSS Feed
Anna7
Calcite | Level 5

Hello, I would like some help with this question:

 

The data contains S&P rating information for around 15,000 companies at the beginning as well as end of the year 2015.

PID: a serial number representing company ID;

Rating2015_B:  S&P rating at the beginning of the year 2015;

Rating2015_E: S&P rating at the end of the year 2015, “D” means company defaults at the end of year

Use the nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.  

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level. 

 

I have this code: 

 

proc surveyselect data=Mylib.LoanData

     method=SRS

     n=100

     reps=100

     seed=9999

     out=Mylib.LoanDataBootStrapCor;run;

proc corr data=Mylib.LoanDataBootStrapCor;

     by replicate;var Rating_2015B;

     ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

run;

data Mylib.BootStrapPearsonCorr1;

     set Mylib.BootStrapPearsonCorr;

     by replicate;

     if first.replicate then delete;

run;

proc univariate data=Mylib.BootStrapPearsonCorr1 pctldef=4; var Rating_2015B;

     output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; run;

proc print data=Mylib.BootStrapCorCI; run;

 

I would like to get some help and understand where I am going wrong and if this code is correct for this particular question. 

When i run the code, i'm getting error as sas is not recognizing BootStrapPearsonCorr1 and BootStrapCorCI

 

Thank you!

5 REPLIES 5
Rick_SAS
SAS Super FREQ

I think you have the correct general idea. This is the basic framework I discuss in Chapter 15 ("Resampling and Bootstrap Methods") of my book Simulating Data with SAS.

 

1. Pearson correlation applies to numerical data. From the spreadsheet that you attached, you have categorical data. Did you mean to use PROC FREQ or some other procedure that handles categorical variables?

2. You only have one variable listed on the VAR statement in PROC CORR. The correlation of a variable with itself is always 1, so even if you had numerical data your output would contain a column of all 1s, which is probably not what you want. Options: include a WITH statement or specify more than one variable.

 

Anna7
Calcite | Level 5

Hi, thanks for your reply. Yes with this dataset, I only have those 2 columns with the categorical data. Currently I have the transition matrix for the beginning of the year default rating and end of year default rating. After that, I am confused as to how the code should proceed to determine the migration relationship and to determine the confidence interval. I am not aware of how to write the code for those two parts. 

 

Capture.PNG

Rick_SAS
SAS Super FREQ

OK. That's a different question. Convert the matrix you have to a transition matrix of probabilities. Then you can use a simulation of the Markov chain model in SAS/IML to compute the end-of-year expected value and CIs for each start-of-year rating. 

 

For a discussion and example, see "Markov transition matrices in SAS/IML"

 

There is a bit of a mathematical wrinkle in this approach. You have discrete categories, so unless you assign an ordinal scale, the best you can do is discuss the expected mode. If you assign numbers for the ratings (AAA=1, AA+=2, AA=3,...) then you can get a numerical mean and CI. However, it is not clear whether the scale should be linear. I will leave that to the domain experts.

Anna7
Calcite | Level 5

Thank you for your reply. I am at beginner level for SAS so I am not aware of a lot of the terminology and many features of SAS.

 

I am not really sure which route to take in order to solve this question, but I have attempted different things.

 

Previously it was this code to figure out the confidence interval, however I am unaware of what is going wrong and how to improve the code:

 

proc surveyselect data=Mylib.LoanData

     method=SRS

     n=100

     reps=100

     seed=9999

     out=Mylib.LoanDataBootStrapCor;run;

proc corr data=Mylib.LoanDataBootStrapCor;

     by replicate;var Rating_2015B;

     ods output PearsonCorr=Mylib.BootStrapPearsonCorr;

run;

data Mylib.BootStrapPearsonCorr1;

     set Mylib.BootStrapPearsonCorr;

     by replicate;

     if first.replicate then delete;

run;

proc univariate data=Mylib.BootStrapPearsonCorr1 pctldef=4; var Rating_2015B;

     output out=Mylib.BootStrapCorCI pctlpts=&pctlpts. pctlpre=pct_; run;

proc print data=Mylib.BootStrapCorCI; run;

 

So then I tried the transition matrix and right now I have gotten this far.

 

Capture.JPG

This has the percentage for every default rate from the beginning of the year to the end.

 

But I am unaware of how to write the code for confidence interval and determining any relationship. The question asks to use:

nonparametric bootstrap method (exclude the defaults) to generate 95% confidence intervals for default rates of each rating category.  

Explore the relationship between the length of CIs and number of bootstrap samples and confidence level. 

 

It would be great, if I can get some help in writing the code. I am not aware of the logic and what the code should be and the functions.

Rick_SAS
SAS Super FREQ

>  I am not aware of the logic and what the code should be and the functions.

 

Let's start with the logic. Focus on the first cell (the other cells are similar.) The original data says that there were 1029 companies that started with A rating. At the end of the time period, 832 / 1029 = 0.8085 was the proportion that stayed A rated.  You can compute this proportion for every rating category in the row. This is the empirical point estimate for the probability that a company that started A rated ends up in Rating[j]..  

 

When you run a Monte Carlo simulation, you get a point estimate in each cell for each of the B simulations that you run. This gives you the sampling distribution for the transition probability. For example, you get B values in the first cell. They might be 

0.83, 0.92, 0.85, 0.73, 0.79, etc.

The mean of those values is the Monte Carlo estimate of the mean. It might be 0.82.  The interval [a,b] is an approximate 95% confidence interval where a is the 2.5th percentile and is the 97.5th percentile.  For each cell, you can use PROC UNIVARIATE to compute the mean and percentiles of the simulated estimates.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 972 views
  • 0 likes
  • 2 in conversation