BookmarkSubscribeRSS Feed
Sophie4
Calcite | Level 5

Hi,

in order to validate a multivariate logistic regression model, I’d like to perform a bootstrap analysis, which resamples the residuals.

I am a student and new to SAS and have problems with the practical application. I‘ve read some explanations, which only focus on resampling observations not residuals. I did find the following explanation how to resample residuals in linear regression models but I have troubles to adapt it for my logistic model, since I model a probability of an event and not an actual value. Maybe someone could help me to adapt the following code or has other suggestions? I am using SAS 9.3. Any help is highly appreciated.

 

from http://www2.sas.com/proceedings/forum2007/183-2007.pdf

 

%let regressors = x;    %let indata = temp1; 
   
/* 1: perform the regression and get the predicted and residual values */    
proc reg data= &INDATA; 
     model y=&REGRESSORS;
output out=out1 p=yhat r=res;      
run; 
   
/* 2: split the data: only the residuals will require URS */    
data fit(keep=yhat &REGRESSORS order) resid(keep=res); 
     set out1;      
order+1;      
run; 
   
/* 3: this doesn’t do any sampling – it copies the FIT data set repeatedly */    
proc surveyselect data=fit out=outfit method=srs samprate=1 rep=1000; run; 
   
/* 4: this does the WR sampling of residuals for each replicate */   
data outres2; 
     do replicate = 1 to 1000;        
	do order = 1 to numrecs; 
         p = ceil(numrecs * ranuni(394747373));          
set resid nobs=numrecs point=p;          
output;          
end; 
end; 
stop; 
run; 
   

/* 5: then the randomized residuals are merged with the unrandomized records */    
data prepped; 
     	merge outfit outres2;      
	by replicate order;      
	new_y=yhat+res;      
	run; 
   
/* 6: the bootstrap process runs on each replicate */    
proc reg data=prepped outest=est1(drop=_:); 
model new_y=&REGRESSORS;      
by replicate;     
run; 
   
/* 7: and the sampling distribution is aggregated */    
proc univariate data=est1; 
var x;      
output out=final pctlpts=2.5, 97.5 pctlpre=ci;      
run; 
proc print; run; 

 

 

 

 

 

 

   

 

 

 

3 REPLIES 3
Reeza
Super User

I think the only actual change is to the PROC REG. You would change that to PROC LOGISTIC. 

The other main thing, which you've already pointed out, is that PROC LOGISTIC generates a probability not a 1/0 output. 

I'm not sure how valid resampling residuals are for a binary output either, since your residuals will be -1, 0 or 1. 

In essence though, you pick a cutoff, for example if PROB>0.7 then Event=1, else Event=0. You could probably add this into your STEP2 code. This will give you your estimate that can be then used in the remaining steps as outlined in your original post.

 

So this is easily technically possible but is it statistically valid, not sure. 

 

 

 

 

Sophie4
Calcite | Level 5

Thank you both! I decided to apply a different validation method, since I am not sure about the validity of my findings. 

 

This is the only 'implementation paper' I've found on this Topic. The authors combine bootstrapping with a weighting step for logistic regressions . For those who are interested...http://jsrad.org/wp-content/2016/Issue%205,%202016/9j.pdf

 

 

 

 

StatDave
SAS Super FREQ

While not exactly what you are asking for, note that you can get statistics on a chosen validation fraction of your data by using the PARTITION statement in PROC HPLOGISTIC. You can also use the SELECTION statement with CHOOSE=VALIDATE if you want to do model selection using statistics computed on the validation data to select effects in the model. Also note that predicted probabilities from the fitted model using a "leave one out" cross-validation approximation are available in PROC LOGISTIC using the PREDPROBS=CROSSVALIDATE option in the OUTPUT statement. Cross-validated predicted probabilities are also used in producing the classification results from the CTABLE option. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 3106 views
  • 2 likes
  • 3 in conversation