Statistical Procedures

FlorianM · Posted 09-28-2023 08:47 AM

Hello all,

I am using the proc logistic to run a multivariate multinomial logistic regression. The dependent has 5 levels and there are ten categorical independent variables.

I would like to get the adjusted frequencies of the independent variables for each level of the dependant variable, but I do not get them.

For example, i would like to obtain the adjusted frequency of women (i.e., an indepenent variable) at levels 1, 2, 3, 4, and 5 of the dependant variable. These values would be the ones highlighted in yellow (e.g., the adjusted frequency of women in Profile 1 is 36%).

Here is (part) of the code I used :

proc logistic data=taxes.analyse ;
class sexe(ref="Men") X2(ref="...") X3(ref="...") X4(ref="BAC") X5(ref="1300€-2600€") /param=glm;
model profileQ(event="5")= sexe X2 X3 X4 X5//expb clodds=wald orpvalue link=glogit ;
lsmeans sexe / means cl ilink exp oddsratio  ;
weight weight_obs;
run;

Has anyone got an idea ?

Best

Florian

StatDave · Posted 09-28-2023 12:03 PM

In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").

lsmeans sex / ilink e;

The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.

output out=preds predprobs=individual;

and then average them

proc means data=preds mean; class sex; var ip:; run;

The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.

data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;

then add SCORE statements to apply the fitted model to each of these data sets.

score data=m out=mpreds;
score data=w out=wpreds;

and then average each

proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;

View solution in original post

awesome_opossum · Posted 09-28-2023 10:09 AM

I think that's a substantively different question, and for that you need discriminant function analysis.

However, I would also say, there's not really any benefit to doing discriminant function analysis over logistic regression. Logistic regression is easier to understand, and the information is essentially mathematically identical.

ballardw · Posted 09-28-2023 10:39 AM

I have a hard time trying to get around something that shows as percentage in the picture of output that you want to "adjust a frequency" (what ever that may actually mean). The values shown in highlight are RATES, not counts or frequencies.

If you want to adjust a rate what adjustment do you want to apply?

FlorianM · Posted 09-28-2023 11:32 AM

Thanks for your answers.

Actually, what I'm trying to calculate is an adjusted rate. Sorry for the confusion.

I want to calculate these adjusted rates because the interpretation of results of multinomial logistic regressions (i.e., odds ratio) is never very obvious (you must have the characteristics of the reference class in mind for this).

Regarding the rates, I want to adjust them on the other independent variables (for example, sex would be adjusted on X1, X2, etc.).

Florian

awesome_opossum · Posted 09-28-2023 12:00 PM

Ah, you're overthinking it.

Just take your code, choose a predictor of interest, PREDICTOR1, remove it as a predictor in the equation, and rerun the model with

WHERE PREDICTOR1 = 1;

StatDave · Posted 09-28-2023 12:03 PM

In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").

lsmeans sex / ilink e;

The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.

output out=preds predprobs=individual;

and then average them

proc means data=preds mean; class sex; var ip:; run;

The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.

data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;

then add SCORE statements to apply the fitted model to each of these data sets.

score data=m out=mpreds;
score data=w out=wpreds;

and then average each

proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;

StatDave · Posted 09-29-2023 12:18 PM

In short, the Mean column from your LSMEANS statement gives you what you want except that it excludes adjusted estimates for the last (fifth) level of your response. To get those estimates, simply refit the model and change the order of the response levels by using the ORDER= option, in parentheses, following the response variable name in the MODEL statement. This will make a different response level be the one missing from the LSMEANS table. So, together with the first analysis, you will have estimates for all response levels.

FlorianM · Posted 10-02-2023 04:21 AM

Thank you all for your answers. It's very clear and helpfull !

Florian

Statistical Procedures

Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Re: Adjusted frequency : proc logistic

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...