Hello, everyone. I am troubled by a problem regarding logistic regression.
The task I am to finish is to perform prediction model internal validation using Bootstrap resampling method.
According to Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, the "flow chart" for a basic version of this process is:
(1) Fit logistic regression model for each Bootstrap sample (i.e. use each Bootstrap sample as the training set);
(2) Use each and every model to fit the original dataset (i.e. use the original dataset as the validation set);
(3) Average the statistics (e.g. area under ROC) of each model that was generated by the Bootstrap sample and fit to the original dataset.
Easy as the process might seem, I am stuck on the problem of outputting the statistics without printing it. Here is my code:
proc surveyselect data=a out=b method=balbootstrap reps=2000;/*a is the original dataset*/
run;
ods select none;
proc logistic data=b outmodel=m;
by replicate;
class a b c/param=ref ref=first;
model y(event='1')=a b c/parmlabel lackfit aggregate scale=pearson
selection=backward sls=0.09 ctable stb rsq;
store n;
run;
proc plm source=n;
score data=a out=s;
run;
proc logistic inmodel=m;
by replicate;
score data=a out=result fitstat outroc=roc;
run;
As the code shows, I used both PROC PLM and PROC LOGISTIC+INMODEL statement to do this job, yet none of them were satisfactory. Both PROC PLM and PROC LOGISTIC+INMODEL statement produced statistics on the individual level. That is, both of them calculated prior and posterior probabilities for each and every observation in the original dataset. That is not I want.
In PROC LOGISTIC, SAS can produce fit statistics by adding FITSTAT in the SCORE statement. As the code shows, I have done this. But it is frustrating that despite the fit statistics (e.g. R square, Brier score, area under ROC curve) are just what I want, there is no way of outputting them to a dataset. The "result" dataset does not contain these statistics. So it seems that the only way of "getting the numbers out of the printing window" is to invoke the printing process and output it to Excel by using statement like odstagsets. excelxp or ods excel. In Excel, I calculate the mean of the statistics.
But since there are 2000 replications (2000 Bootstrap samples), the printing work for the computer and the time spent on this may be formidable. What is more, the computer might simply crash, without generating anything useful, as is the case when I wished to output all of the displayed results to a PDF for 2000 Bootstrap samples on another occasion (the computer eventually generated a PDF as large as 1.3G, but there was no way of opening it, since the computer kept on reporting errors when I tried to open it again and again).
So here is my question: is there a method to output fit statistics to a dataset in SCORE statement of PROC LOGISTIC?
Many thanks!
... View more