I used the following SAS syntax to estimate the model-predicted probabilities of being in level 0 of the outcome variable. The model is a multinomial logistic model with a cumulative logit link. This outcome variable, sex_plea, has three levels: 0, 1, and 2. Because the association between age and the outcome probability was nonlinear, I created a spline variable for age (spl_age). I used this statement to estimate the predicted probabilities and save them in a file: output out=Female_Probs pred(ilink)=Prob;. I am new to the model with a spline variable. Could you please check if that statement for predicted probabilities is correct? If not, how to fix it? Thank you in advance!
While your OUTPUT statement is correct, as Koen says, you should note that the predicted values it provides from your ordinal model are cumulative predicted probabilities. That is, Pr(SEX_PLEA=0) and Pr(SEX_PLEA=0 or 1) for your three-level response. If you want predicted probabilities of the individual response levels, you would need to compute them by subtraction. However, as Koen also noted, this model is more conveniently fit using PROC LOGISTIC. PROC LOGISTIC offers the PREDPROBS= option in its OUTPUT statement which allows you to produce either of both of the individual and/or cumulative predicted probabilities. For example, the following statements fit the model and output both types of predicted probabilities. Note that the PREDPROBS option outputs only one observation for each input observation with the various probabilities in separate variables, which is much more convenient than GLIMMIX which outputs multiple observations for each input observation. The PARAM=GLM option is needed to use the same coding for categorical predictors as used by GLIMMIX.
proc logistic data=apc_4w ;
effect spl_age=spline(age/details naturalcubic basis=tpf(noint) knotmethod=list(60 70 80));
class ethgrp educ job marital hearng / param=glm ref=first;
model sex_plea = spl_age c_cohort c_cohort2 marital ethgrp educ job hearng;
output out=Female_Probs predprobs=(i c);
run;
I guess the levels (0, 1, and 2) of your response variable "sex_plea" have an essential ordering ... and that's why you fit a model that uses cumulative logits. Note that a model that uses generalized logits is more appropriate for nominal responses. Cumulative logits are for an ordinal response.
I think your output statement is correct.
PREDICTED with option ILINK gives you the Predicted Mean.
But why don't you use PROC LOGISTIC?
You don't fit a Generalized Linear Mixed Model (with mixed effects) which is what PROC GLIMMIX was built for.
Here is an example of an ordinal logistic regression with the LOGISTIC procedure.
https://go.documentation.sas.com/doc/en/statug/latest/statug_logistic_examples03.htm
You can specify LINK=CUMLOGIT in the MODEL statement and PROC LOGISTIC also has an EFFECT statement (for your spline effect) and an output statement (to obtain your predicted probabilities).
Note that the model statement in the example above has no LINK= option. That's because LINK=LOGIT is the default. PROC LOGISTIC with LINK=LOGIT fits the binary logit model when there are two response categories and fits the cumulative logit model when there are more than two response categories. The aliases are CLOGIT and CUMLOGIT. The logit function is is the inverse of the cumulative logistic distribution function. But you can specify LINK=CUMLOGIT explicitly.
BR, Koen
While your OUTPUT statement is correct, as Koen says, you should note that the predicted values it provides from your ordinal model are cumulative predicted probabilities. That is, Pr(SEX_PLEA=0) and Pr(SEX_PLEA=0 or 1) for your three-level response. If you want predicted probabilities of the individual response levels, you would need to compute them by subtraction. However, as Koen also noted, this model is more conveniently fit using PROC LOGISTIC. PROC LOGISTIC offers the PREDPROBS= option in its OUTPUT statement which allows you to produce either of both of the individual and/or cumulative predicted probabilities. For example, the following statements fit the model and output both types of predicted probabilities. Note that the PREDPROBS option outputs only one observation for each input observation with the various probabilities in separate variables, which is much more convenient than GLIMMIX which outputs multiple observations for each input observation. The PARAM=GLM option is needed to use the same coding for categorical predictors as used by GLIMMIX.
proc logistic data=apc_4w ;
effect spl_age=spline(age/details naturalcubic basis=tpf(noint) knotmethod=list(60 70 80));
class ethgrp educ job marital hearng / param=glm ref=first;
model sex_plea = spl_age c_cohort c_cohort2 marital ethgrp educ job hearng;
output out=Female_Probs predprobs=(i c);
run;
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.