BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Season
Barite | Level 11

I am interested in building a prediction model that consists of two parts-(1) determine whether an event happens (via logistic regression or other models if logistic regression failed) and (2) determine the severity of the event (the severity can be measured by a continuous variable and therefore I prefer multiple linear regression, other models will of course be tried if that model failed). I wonder (1) whether Bootstrap resampling is a good choice when it comes to the validation of my prediction model formed by the procedures above; (2) if SAS can perform Bootstrap resampling and (3) the method of performing Bootstrap resampling on SAS 9.4 TS1M2. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Season
Barite | Level 11

I am here to answer my own question. More specifically, I am here citing a solution provided elsewhere. Using SAS to Validate Prediction Models provides a very detailed discussion on the ways to implement prediction model internal validation by bootstrap resampling with SAS. I have an updated version of SAS right now and am not sure whether the entire code can be run on SAS version 9.4 TS1M2.

For data analysts who are as new to this field, eager to find a solution to their problem at hand and do not have much time to understand the details of this code as I was some two years ago when raising this question, the message I have right now is that using bootstrap resampling for model validation requires you to build models on every bootstrap sample. Bootstrap samples cannot be generated from model-building procedures like the REG, LOGISTIC and PHREG procedure. You have to generate bootstrap samples elsewhere (i.e., with the SURVEYSELECT procedure or the macro given in the paper I cited) and somehow build models on each bootstrap sample. An efficient way to do the latter is to use the BY statement available in the majority of model-building procedures. Do not spend your time on reading SAS Help sections relevant to the model-building procedures over and over again to try to find an option or a statement that can help you do this without the help of other modules (e.g., the SURVEYSELECT procedure) like I did about two years ago.

View solution in original post

9 REPLIES 9
SteveDenham
Jade | Level 19

Yes bootstrapping can be done - check the blogs from @Rick_SAS , as there should be some very comprehensive articles in there on the subject.

 

SteveDenham

Season
Barite | Level 11
Thank you, Steve, for your kind help!
SylvainTremblay
SAS Employee

Yes, SAS can perform Bootstrap resampling and there are many ways to do so.

This blog entry describes best practices and techniques: The essential guide to bootstrapping in SAS

 

Regards,

Sylvain

Season
Barite | Level 11

Thank you, Sylvain, for the help you offered!

Rick_SAS
SAS Super FREQ

In SAS, PROC SEVERITY can model data like this. Also, PROC GENMOD can model data by using the Tweedie distribution,  which is often used in the insurance industry for modeling.

 

(1) whether Bootstrap resampling is a good choice when it comes to the validation of my prediction model formed by the procedures above

The bootstrap is not really a validation procedure. It is an inferential method for approximating the standard error of a statistic or the confidence interval for a parameter. Perhaps you are thinking of cross-validation techniques? PROC HPGENSELECT can model the same data as PROC GENMOD, but also supports splitting the data into training, testing, and validation subsets.

 

(2) if SAS can perform Bootstrap resampling

Yes.

 

(3) the method of performing Bootstrap resampling on SAS 9.4 TS1M2. 

For predictive models with iid errors, you have two choices:

Season
Barite | Level 11

Thank you, Rick, for kindly offering me help!

Actually, I haven't heard about PROC SEVERITY before. Thank you for your suggestion.

Season
Barite | Level 11

Hello, Rick. Upon raising my question in early December last year, I had not revised the knowledge regarding model validation and Bootstrap resampling. After revising knowledge concerning this topic and tried to find information regarding using Bootstrap resampling to perform model internal validation, I found that the information you offered was of the greatest utility.

As you have mentioned, Bootstrap resampling was not a statistical method specially designed for model validation. But currently Bootstrap resampling can be deemed as the top choice when it comes to the internal validation of prediction models. To the best of my knowledge, using Bootstrap resampling as a means of prediction model internal validation follows the following procedures: (1) Suppose that we have observations. Use Bootstrap resampling to generate samples, with being the times Bootstrap resampling is performed repeatedly and each and every of the sample generated contains n observations. (2) Perform internal validation for each sample. (3) "Average" the statistics that assess the discrimination and calibration of the prediction model (e.g., C-statistic, Brier score). By the way, I am still not sure about the way to "average" the statistics (e.g., whether the final statistics of the prediction model is a weighted average of each and every of the statistics of the sample generated by Bootstrap resampling or an arithmetic average). 

Therefore, after consulting SAS Help, the blog you have mentioned, and discussions posted on other communities. I have generated the SAS code to perform prediction model internal validation using Bootstrap resampling.

proc surveyselect data=a out=b method=balbootstrap reps=1000;/*let B=1000 here, which is a number frequently used when it comes to Bootstrap resampling*/
run;
proc hplogistic data=b;
by replicate;/*instructs the SAS to perform B=1000 times of PROC HPLOGISTIC, using the observations of the same number of replicate at one time*/
model x(event='1')=y z/lackfit;
partition fraction(test=0.5 validate=0);/*perform model validation*/ output out=c/allstats; run;

In brief, the user should perform Bootstrap resampling prior to build and validate the prediction model.

By running the SAS code mentioned above, SAS can perform Bootstrap resampling first and generate 1000 samples, then generate and perform internal validation of the prediction model sample by sample. But SAS will ultimately produce (which equals to 1000 here) groups of statistics regarding the discrimination and calibration of the prediction model.

I wonder (1) if my code were correct and (2) the means of generating "overall statistics" on the discrimination and calibration of the prediction model after finishing the process above.

Thanks!

Rick_SAS
SAS Super FREQ

I wonder (1) if my code is correct

Your code will generate B bootstrap samples and statistics, but I don't understand what you think you will accomplish by doing this. Using bootstrapping enables you to approximate the standard errors and confidence intervals for any statistic from the logistic regression (eg, area under the ROC curve). But your code does not capture any of those statistics. It does write observation-wise statistics like the predicted value, raw residuals, and Pearson residuals for each observation, but I do not know how to use those values to determine "discrimination and calibration.".  

 

> 2) the means of generating "overall statistics" on the discrimination and calibration of the prediction model after finishing the process above.

 

If you are interested in a calibration curve, PROC LOGISTIC can provide that for you, and it comes with a built-in confidence band:

 

proc logistic data=a  plots=calibration(clm showobs);
   model x(event='1')=y z/lackfit;
run;

Sorry, that I do not understand your goals. Good luck in your project. 

Season
Barite | Level 11

I am here to answer my own question. More specifically, I am here citing a solution provided elsewhere. Using SAS to Validate Prediction Models provides a very detailed discussion on the ways to implement prediction model internal validation by bootstrap resampling with SAS. I have an updated version of SAS right now and am not sure whether the entire code can be run on SAS version 9.4 TS1M2.

For data analysts who are as new to this field, eager to find a solution to their problem at hand and do not have much time to understand the details of this code as I was some two years ago when raising this question, the message I have right now is that using bootstrap resampling for model validation requires you to build models on every bootstrap sample. Bootstrap samples cannot be generated from model-building procedures like the REG, LOGISTIC and PHREG procedure. You have to generate bootstrap samples elsewhere (i.e., with the SURVEYSELECT procedure or the macro given in the paper I cited) and somehow build models on each bootstrap sample. An efficient way to do the latter is to use the BY statement available in the majority of model-building procedures. Do not spend your time on reading SAS Help sections relevant to the model-building procedures over and over again to try to find an option or a statement that can help you do this without the help of other modules (e.g., the SURVEYSELECT procedure) like I did about two years ago.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3559 views
  • 7 likes
  • 4 in conversation