turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Estimates from a logistic regression model with bo...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-21-2018 03:46 PM - last edited on 06-21-2018 04:27 PM by Reeza

Hello - I'm trying to derive and save an unbiased logistic regression model using the outputted estimates from 500 bootstraps. I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.) I now want to save this model and use it to score an external dataset. I could hard code it, but want the output I could get by using the Score statement in Proc Logisitic. Any help would be appreciated! Thanks

Accepted Solutions

Solution

06-22-2018
12:11 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LD4224

06-22-2018 10:04 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LD4224

06-21-2018 03:52 PM - last edited on 06-21-2018 04:27 PM by Reeza

@LD4224 wrote:

I now want to save this model and use it to score an external dataset.

What model? The model of the bootstrap medians?

I don't think you can get this any other way than by hard-coding it, or perhaps by clever use of a macro.

I have not heard of using the bootstrap medians as the new model.

I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.)

I assume this means you did some for of stepwise selection (or forward or backward selection) when you fit the original model, and of course, I think then that the bootstrap ought to also do the stepwise and see if different variables are selected, that would be important to know.

Also, I strenuously object to the title of this post, and almost reported it as spam — MODERATORS can you change the title?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LD4224

06-21-2018 06:04 PM

i dont understand. Is the issue that you just want to avoid hard coding? if so, why not take the median (across bootstraps) from eg univariate and make it a macro variable and then use this as a coefficient in the proc logistic code score statement?

--------------

blog: papersandprograms.com

blog: papersandprograms.com

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaulBrownPhD

06-21-2018 07:42 PM

Thanks for replying.

To clarify:

Variable selection was done in the development set using the AIC (stepwise selection, SLENTRY=1 SLSTAY=1). I realize that Harrell and others recommend using bootstrapping for variable selection, but I'm sticking with the stepwise AIC approach.

Using the mean or median of the coefficients obtained in the bootstrapped samples is referred to as "bootstrap aggregating" or "bagging" of coefficients.

It's not that I want to avoid hard scoring. The way I've used Score in the past is as such, which allows me to get the ROC graphs and c statistic for the scored dataset.

PROC LOGISTIC DATA=WORK.BRAIST_SMS;

CLASS THORACIC (REF='0' PARAM=REF) SMSC2 (REF='1.2' PARAM=REF);

MODEL VTERM (EVENT='1') = SMSC2 THORACIC COBBMAX;

SCORE DATA=WORK.VALID_SMS OUT=VALIDP OUTROC=VROC;

ROC;

ROCCONTRAST;

Is there a way to take the coefficients from the bootstraps and create something that would function like the "outmodel" does below?

proc logistic data = hsb2 outmodel=pout;

model honcomp = read math;

run;

proc logistic inmodel=pout;

score clm data = toscore out=pred ;

run;

Ideas?

Thanks

To clarify:

Variable selection was done in the development set using the AIC (stepwise selection, SLENTRY=1 SLSTAY=1). I realize that Harrell and others recommend using bootstrapping for variable selection, but I'm sticking with the stepwise AIC approach.

Using the mean or median of the coefficients obtained in the bootstrapped samples is referred to as "bootstrap aggregating" or "bagging" of coefficients.

It's not that I want to avoid hard scoring. The way I've used Score in the past is as such, which allows me to get the ROC graphs and c statistic for the scored dataset.

PROC LOGISTIC DATA=WORK.BRAIST_SMS;

CLASS THORACIC (REF='0' PARAM=REF) SMSC2 (REF='1.2' PARAM=REF);

MODEL VTERM (EVENT='1') = SMSC2 THORACIC COBBMAX;

SCORE DATA=WORK.VALID_SMS OUT=VALIDP OUTROC=VROC;

ROC;

ROCCONTRAST;

Is there a way to take the coefficients from the bootstraps and create something that would function like the "outmodel" does below?

proc logistic data = hsb2 outmodel=pout;

model honcomp = read math;

run;

proc logistic inmodel=pout;

score clm data = toscore out=pred ;

run;

Ideas?

Thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LD4224

06-21-2018 08:38 PM - edited 06-21-2018 08:41 PM

still not clear to me what youre asking. Consider my original answer ie use a macro variable. You didn't indicate why that wouldn't work - if you did that would help me understand your question. Don't worry about explaining the bootstrap etc, i get that, i just don't know what you want (if it's not what i already assumed)

edit: re "It's not that I want to avoid hard scoring" [i assume you meant hard coding], clearly you don't want to hard code, otherwise you would do it in 2 seconds, and you said yourself: "I could hard code it, but ...". So that seems to me to be the issue and that's easily solved with a macro variable

--------------

blog: papersandprograms.com

blog: papersandprograms.com

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaulBrownPhD

06-21-2018 11:39 PM

Thanks. Can you explain how the maco would function and how I would write it? Or even how I would hard code it? I have looked all over and can't find any examples.

Solution

06-22-2018
12:11 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LD4224

06-22-2018 10:04 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

a month ago - last edited a month ago

incidentally, that is what i meant by 'hard coding' ie in that example they simply write out the coefficients. It would be better to define macros variables to minimise the possibility of misspecifying the model i guess. For example, the following type of thing is not uncommon:

proc univariate data=....;

var x;

output out=m1 mean=mean;

run;

data m2;

set m1;

call symput ('mean1', mean);

run;

proc nlmixed data=....;

:

:

estimate 'Treatment A' exp(mu + &mean1.*b1 + 0.5*b2 + b3 + b4);

estimate 'Treatment B' exp(mu + &mean1.*b1 + 0.5*b2 + b4);

run;

--------------

blog: papersandprograms.com

blog: papersandprograms.com