Hello - I'm trying to derive and save an unbiased logistic regression model using the outputted estimates from 500 bootstraps. I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.) I now want to save this model and use it to score an external dataset. I could hard code it, but want the output I could get by using the Score statement in Proc Logisitic. Any help would be appreciated! Thanks
I assume that whatever you did has left you with a final set of model coefficients that you want to use to score a new data set for the purpose of obtaining the ROC analysis. You can score the data from the coefficients as outlined in section 4 of this note. Using the logistic() function as mentioned there to get the predicted probabilities for the new data, you can then use the PRED= option in the SCORE statement as shown in this note to get the ROC analysis.
@LD4224 wrote:
I now want to save this model and use it to score an external dataset.
What model? The model of the bootstrap medians?
I don't think you can get this any other way than by hard-coding it, or perhaps by clever use of a macro.
I have not heard of using the bootstrap medians as the new model.
I derived a logistic regression model in my development set, then used the retained variables in a model statement and ran it in 500 bootstrapped replicates of my development set. I then retrieved the estimates from the ODS ParameterEstimates tables and calculated the medians of the intercepts and beta estimates. (Let me know if anyone disagrees with this approach.)
I assume this means you did some for of stepwise selection (or forward or backward selection) when you fit the original model, and of course, I think then that the bootstrap ought to also do the stepwise and see if different variables are selected, that would be important to know.
Also, I strenuously object to the title of this post, and almost reported it as spam — MODERATORS can you change the title?
i dont understand. Is the issue that you just want to avoid hard coding? if so, why not take the median (across bootstraps) from eg univariate and make it a macro variable and then use this as a coefficient in the proc logistic code score statement?
still not clear to me what youre asking. Consider my original answer ie use a macro variable. You didn't indicate why that wouldn't work - if you did that would help me understand your question. Don't worry about explaining the bootstrap etc, i get that, i just don't know what you want (if it's not what i already assumed)
edit: re "It's not that I want to avoid hard scoring" [i assume you meant hard coding], clearly you don't want to hard code, otherwise you would do it in 2 seconds, and you said yourself: "I could hard code it, but ...". So that seems to me to be the issue and that's easily solved with a macro variable
I assume that whatever you did has left you with a final set of model coefficients that you want to use to score a new data set for the purpose of obtaining the ROC analysis. You can score the data from the coefficients as outlined in section 4 of this note. Using the logistic() function as mentioned there to get the predicted probabilities for the new data, you can then use the PRED= option in the SCORE statement as shown in this note to get the ROC analysis.
incidentally, that is what i meant by 'hard coding' ie in that example they simply write out the coefficients. It would be better to define macros variables to minimise the possibility of misspecifying the model i guess. For example, the following type of thing is not uncommon:
proc univariate data=....;
var x;
output out=m1 mean=mean;
run;
data m2;
set m1;
call symput ('mean1', mean);
run;
proc nlmixed data=....;
:
:
estimate 'Treatment A' exp(mu + &mean1.*b1 + 0.5*b2 + b3 + b4);
estimate 'Treatment B' exp(mu + &mean1.*b1 + 0.5*b2 + b4);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.