Programming the statistical procedures from SAS

Estimates from a logistic regression model with bootstraps

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Estimates from a logistic regression model with bootstraps

[ Edited ]

Hello -  I'm trying to derive and save an unbiased logistic regression model using the outputted estimates from 500 bootstraps.  I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set.  I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates.  (Let me know if anyone disagrees with this approach.)  I now want to save this model and use it to score an external dataset.  I could hard code it, but want the output I could get by using the Score statement in Proc Logisitic.  Any help would be appreciated! Thanks


Accepted Solutions
Solution
‎06-22-2018 12:11 PM
SAS Employee
Posts: 386

Re: Estimates from a logistic regression model with bootstraps

I assume that whatever you did has left you with a final set of model coefficients that you want to use to score a new data set for the purpose of obtaining the ROC analysis. You can score the data from the coefficients as outlined in section 4 of this note. Using the logistic() function as mentioned there to get the predicted probabilities for the new data, you can then use the PRED= option in the SCORE statement as shown in this note to get the ROC analysis.

View solution in original post


All Replies
Respected Advisor
Posts: 3,069

Re: Estimates from a logistic regression model with bootstraps

[ Edited ]

@LD4224 wrote:

I now want to save this model and use it to score an external dataset. 


What model? The model of the bootstrap medians?

 

I don't think you can get this any other way than by hard-coding it, or perhaps by clever use of a macro.

 

I have not heard of using the bootstrap medians as the new model.

I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set.  I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates.  (Let me know if anyone disagrees with this approach.)

 

I assume this means you did some for of stepwise selection (or forward or backward selection) when you fit the original model, and of course, I think then that the bootstrap ought to also do the stepwise and see if different variables are selected, that would be important to know.

 

Also, I strenuously object to the title of this post, and almost reported it as spam — MODERATORS can you change the title?

--
Paige Miller
Regular Contributor
Posts: 167

Re: Estimates from a logistic regression model with bootstraps

i dont understand. Is the issue that you just want to avoid hard coding? if so, why not take the median (across bootstraps) from eg univariate and make it a macro variable and then use this as a coefficient in the proc logistic code score statement?

--------------
blog: papersandprograms.com
New Contributor
Posts: 3

Re: Estimates from a logistic regression model with bootstraps

Posted in reply to PaulBrownPhD
Thanks for replying.



To clarify:

Variable selection was done in the development set using the AIC (stepwise selection, SLENTRY=1 SLSTAY=1). I realize that Harrell and others recommend using bootstrapping for variable selection, but I'm sticking with the stepwise AIC approach.



Using the mean or median of the coefficients obtained in the bootstrapped samples is referred to as "bootstrap aggregating" or "bagging" of coefficients.



It's not that I want to avoid hard scoring. The way I've used Score in the past is as such, which allows me to get the ROC graphs and c statistic for the scored dataset.


PROC LOGISTIC DATA=WORK.BRAIST_SMS;
CLASS THORACIC (REF='0' PARAM=REF) SMSC2 (REF='1.2' PARAM=REF);
MODEL VTERM (EVENT='1') = SMSC2 THORACIC COBBMAX;

SCORE DATA=WORK.VALID_SMS OUT=VALIDP OUTROC=VROC;
ROC;

ROCCONTRAST;



Is there a way to take the coefficients from the bootstraps and create something that would function like the "outmodel" does below?
proc logistic data = hsb2 outmodel=pout;
model honcomp = read math;
run;


proc logistic inmodel=pout;

score clm data = toscore out=pred ;

run;



Ideas?



Thanks





Regular Contributor
Posts: 167

Re: Estimates from a logistic regression model with bootstraps

[ Edited ]

still not clear to me what youre asking. Consider my original answer ie use a macro variable. You didn't indicate why that wouldn't work - if you did that would help me understand your question. Don't worry about explaining the bootstrap etc, i get that, i just don't know what you want (if it's not what i already assumed)

 

edit: re "It's not that I want to avoid hard scoring" [i assume you meant hard coding], clearly you don't want to hard code, otherwise you would do it in 2 seconds, and you said yourself: "I could hard code it, but ...". So that seems to me to be the issue and that's easily solved with a macro variable

--------------
blog: papersandprograms.com
New Contributor
Posts: 3

Re: Estimates from a logistic regression model with bootstraps

Posted in reply to PaulBrownPhD
Thanks. Can you explain how the maco would function and how I would write it? Or even how I would hard code it? I have looked all over and can't find any examples.
Solution
‎06-22-2018 12:11 PM
SAS Employee
Posts: 386

Re: Estimates from a logistic regression model with bootstraps

I assume that whatever you did has left you with a final set of model coefficients that you want to use to score a new data set for the purpose of obtaining the ROC analysis. You can score the data from the coefficients as outlined in section 4 of this note. Using the logistic() function as mentioned there to get the predicted probabilities for the new data, you can then use the PRED= option in the SCORE statement as shown in this note to get the ROC analysis.

Regular Contributor
Posts: 167

Re: Estimates from a logistic regression model with bootstraps

[ Edited ]
Posted in reply to StatDave_sas

incidentally, that is what i meant by 'hard coding' ie in that example they simply write out the coefficients. It would be better to define macros variables to minimise the possibility of misspecifying the model i guess. For example, the following type of thing is not uncommon:

 

proc univariate data=....;
    var x;
    output out=m1 mean=mean;
run;

data m2;
    set m1;
    call symput ('mean1', mean);
run;

proc nlmixed data=....;
:

:

    estimate 'Treatment A' exp(mu + &mean1.*b1 + 0.5*b2 + b3 + b4);
    estimate 'Treatment B'   exp(mu + &mean1.*b1 + 0.5*b2 + b4);
run;

--------------
blog: papersandprograms.com
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 162 views
  • 1 like
  • 4 in conversation