Hi all. I'm running a logistic regression for odds of receiving a skeletal survey in children less than 1 year of age admitted to the hospital for an accidental fall. My database is the National Trauma Data Bank, and there are over 5000 facilities included in my cohort. My model is the following:
proc logistic data = logreg.alldata;
model sksurvey (event = '1')= mnatam_pi masian mblack moth_mix
mhisp maid mother mloc mISS/rsquare;
run;
The first 5 variables are for race, maid = primary payment by medicaid, mother = other primary payment method, mloc = injury occurred outside the home, mISS = injury severity greater than 10.
My Rsquare is very low (0.041), and I think this might be due to variation of the outcome (skeletal survey) by facility, for which I have a variable. I've never created a fixed effects model; would someone who is more familiar with this coding help point me in the right direction?
It's not clear why you say R-squared is low, as PROC LOGISTIC doesn't produce an R-squared statistic. So your reasoning for adding a term into the model seems suspect.
Nevertheless, you can add facility into the model by putting it in a CLASS statement and then adding facility to the model.
Can you post the OUTPUT of model ?
Use SELECTION= to shrink your model ,and CORRB to check multi - collinearity among variables ,and drop the outliers (obs).
model sksurvey (event = '1')= mnatam_pi masian mblack moth_mix
mhisp maid mother mloc mISS/rsquare selection=stepwise corrb ;
Here's the output of the original model
You only have 9 variables in model .
And don't post the parameter estimator table:
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -0.6941 10.1967 0.0046 0.9457 Age 1 1.1785 0.7807 2.2785 0.1312 Weight 1 -0.0829 0.0637 1.6907 0.1935 Height 1 -0.1111 0.2534 0.1921 0.6611
And correlation coefficient table :
Estimated Correlation Matrix Parameter Intercept Age Weight Height Intercept 1.0000 -0.1298 0.6428 -0.8131 Age -0.1298 1.0000 -0.4392 -0.4047 Weight 0.6428 -0.4392 1.0000 -0.5208 Height -0.8131 -0.4047 -0.5208 1.0000
Try FIRTH option.
proc logistic data=sashelp.class;
model sex=age weight height/firth corrb;
run;
And if you want enhance AUC a.k.a C statistic , drop some obs ( outliers) by this code:
proc logistic data=want outest=est(keep=intercept &varlist);
model good_bad(event='good')= &varlist
/outroc=x.roc lackfit scale=none aggregate rsquare firth;
output out=output h=h c=c cbar=cbar;
run;
proc sort data=output out=check_c ;
by descending c;
run;
proc sort data=output out=check_h ;
by descending h;
run;
And in table CHECK_C and CHECK_H ,you will find some outlier (the top n obs) .
and make an ID variable to drop these obs. and fit PROC LOGISTIC with new data again, you will get better AUC .
data want; set have; id+1; if id in (237 764 334 93 305 178 918) then delete; run;
But I prefer to Goodness Of Fit statistic like :
model good_bad(event='good')= &varlist / lackfit scale=none aggregate rsquare firth; /* GOF - if you have SAS9.4m6*/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.