Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Estimates from a logistic regression model with bootstraps

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-21-2018 03:46 PM
(1670 views)

Hello - I'm trying to derive and save an unbiased logistic regression model using the outputted estimates from 500 bootstraps. I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.) I now want to save this model and use it to score an external dataset. I could hard code it, but want the output I could get by using the Score statement in Proc Logisitic. Any help would be appreciated! Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@LD4224 wrote:

I now want to save this model and use it to score an external dataset.

What model? The model of the bootstrap medians?

I don't think you can get this any other way than by hard-coding it, or perhaps by clever use of a macro.

I have not heard of using the bootstrap medians as the new model.

I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.)

I assume this means you did some for of stepwise selection (or forward or backward selection) when you fit the original model, and of course, I think then that the bootstrap ought to also do the stepwise and see if different variables are selected, that would be important to know.

Also, I strenuously object to the title of this post, and almost reported it as spam — MODERATORS can you change the title?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for replying.

To clarify:

Variable selection was done in the development set using the AIC (stepwise selection, SLENTRY=1 SLSTAY=1). I realize that Harrell and others recommend using bootstrapping for variable selection, but I'm sticking with the stepwise AIC approach.

Using the mean or median of the coefficients obtained in the bootstrapped samples is referred to as "bootstrap aggregating" or "bagging" of coefficients.

It's not that I want to avoid hard scoring. The way I've used Score in the past is as such, which allows me to get the ROC graphs and c statistic for the scored dataset.

PROC LOGISTIC DATA=WORK.BRAIST_SMS;

CLASS THORACIC (REF='0' PARAM=REF) SMSC2 (REF='1.2' PARAM=REF);

MODEL VTERM (EVENT='1') = SMSC2 THORACIC COBBMAX;

SCORE DATA=WORK.VALID_SMS OUT=VALIDP OUTROC=VROC;

ROC;

ROCCONTRAST;

Is there a way to take the coefficients from the bootstraps and create something that would function like the "outmodel" does below?

proc logistic data = hsb2 outmodel=pout;

model honcomp = read math;

run;

proc logistic inmodel=pout;

score clm data = toscore out=pred ;

run;

Ideas?

Thanks

To clarify:

Variable selection was done in the development set using the AIC (stepwise selection, SLENTRY=1 SLSTAY=1). I realize that Harrell and others recommend using bootstrapping for variable selection, but I'm sticking with the stepwise AIC approach.

Using the mean or median of the coefficients obtained in the bootstrapped samples is referred to as "bootstrap aggregating" or "bagging" of coefficients.

It's not that I want to avoid hard scoring. The way I've used Score in the past is as such, which allows me to get the ROC graphs and c statistic for the scored dataset.

PROC LOGISTIC DATA=WORK.BRAIST_SMS;

CLASS THORACIC (REF='0' PARAM=REF) SMSC2 (REF='1.2' PARAM=REF);

MODEL VTERM (EVENT='1') = SMSC2 THORACIC COBBMAX;

SCORE DATA=WORK.VALID_SMS OUT=VALIDP OUTROC=VROC;

ROC;

ROCCONTRAST;

Is there a way to take the coefficients from the bootstraps and create something that would function like the "outmodel" does below?

proc logistic data = hsb2 outmodel=pout;

model honcomp = read math;

run;

proc logistic inmodel=pout;

score clm data = toscore out=pred ;

run;

Ideas?

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

still not clear to me what youre asking. Consider my original answer ie use a macro variable. You didn't indicate why that wouldn't work - if you did that would help me understand your question. Don't worry about explaining the bootstrap etc, i get that, i just don't know what you want (if it's not what i already assumed)

edit: re "It's not that I want to avoid hard scoring" [i assume you meant hard coding], clearly you don't want to hard code, otherwise you would do it in 2 seconds, and you said yourself: "I could hard code it, but ...". So that seems to me to be the issue and that's easily solved with a macro variable

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks. Can you explain how the maco would function and how I would write it? Or even how I would hard code it? I have looked all over and can't find any examples.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

incidentally, that is what i meant by 'hard coding' ie in that example they simply write out the coefficients. It would be better to define macros variables to minimise the possibility of misspecifying the model i guess. For example, the following type of thing is not uncommon:

proc univariate data=....;

var x;

output out=m1 mean=mean;

run;

data m2;

set m1;

call symput ('mean1', mean);

run;

proc nlmixed data=....;

:

:

estimate 'Treatment A' exp(mu + &mean1.*b1 + 0.5*b2 + b3 + b4);

estimate 'Treatment B' exp(mu + &mean1.*b1 + 0.5*b2 + b4);

run;

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.