Hello all,
I am using the proc logistic to run a multivariate multinomial logistic regression. The dependent has 5 levels and there are ten categorical independent variables.
I would like to get the adjusted frequencies of the independent variables for each level of the dependant variable, but I do not get them.
For example, i would like to obtain the adjusted frequency of women (i.e., an indepenent variable) at levels 1, 2, 3, 4, and 5 of the dependant variable. These values would be the ones highlighted in yellow (e.g., the adjusted frequency of women in Profile 1 is 36%).
Here is (part) of the code I used :
proc logistic data=taxes.analyse ;
class sexe(ref="Men") X2(ref="...") X3(ref="...") X4(ref="BAC") X5(ref="1300€-2600€") /param=glm;
model profileQ(event="5")= sexe X2 X3 X4 X5//expb clodds=wald orpvalue link=glogit ;
lsmeans sexe / means cl ilink exp oddsratio ;
weight weight_obs;
run;
Has anyone got an idea ?
Best
Florian
In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").
lsmeans sex / ilink e;
The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.
output out=preds predprobs=individual;
and then average them
proc means data=preds mean; class sex; var ip:; run;
The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.
data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;
then add SCORE statements to apply the fitted model to each of these data sets.
score data=m out=mpreds;
score data=w out=wpreds;
and then average each
proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;
I think that's a substantively different question, and for that you need discriminant function analysis.
However, I would also say, there's not really any benefit to doing discriminant function analysis over logistic regression. Logistic regression is easier to understand, and the information is essentially mathematically identical.
I have a hard time trying to get around something that shows as percentage in the picture of output that you want to "adjust a frequency" (what ever that may actually mean). The values shown in highlight are RATES, not counts or frequencies.
If you want to adjust a rate what adjustment do you want to apply?
Thanks for your answers.
Actually, what I'm trying to calculate is an adjusted rate. Sorry for the confusion.
I want to calculate these adjusted rates because the interpretation of results of multinomial logistic regressions (i.e., odds ratio) is never very obvious (you must have the characteristics of the reference class in mind for this).
Regarding the rates, I want to adjust them on the other independent variables (for example, sex would be adjusted on X1, X2, etc.).
Florian
Ah, you're overthinking it.
Just take your code, choose a predictor of interest, PREDICTOR1, remove it as a predictor in the equation, and rerun the model with
WHERE PREDICTOR1 = 1;
In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").
lsmeans sex / ilink e;
The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.
output out=preds predprobs=individual;
and then average them
proc means data=preds mean; class sex; var ip:; run;
The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.
data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;
then add SCORE statements to apply the fitted model to each of these data sets.
score data=m out=mpreds;
score data=w out=wpreds;
and then average each
proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;
In short, the Mean column from your LSMEANS statement gives you what you want except that it excludes adjusted estimates for the last (fifth) level of your response. To get those estimates, simply refit the model and change the order of the response levels by using the ORDER= option, in parentheses, following the response variable name in the MODEL statement. This will make a different response level be the one missing from the LSMEANS table. So, together with the first analysis, you will have estimates for all response levels.
Thank you all for your answers. It's very clear and helpfull !
Florian
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.