Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Adjusted frequency : proc logistic

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-28-2023 08:47 AM
(390 views)

Hello all,

I am using the proc logistic to run a multivariate multinomial logistic regression. The dependent has 5 levels and there are ten categorical independent variables.

I would like to get the adjusted frequencies of the independent variables for each level of the dependant variable, but I do not get them.

For example, i would like to obtain the **adjusted** frequency of women (i.e., an indepenent variable) at levels 1, 2, 3, 4, and 5 of the dependant variable. These values would be the ones highlighted in yellow (e.g., the adjusted frequency of women in Profile 1 is 36%).

Here is (part) of the code I used :

```
proc logistic data=taxes.analyse ;
class sexe(ref="Men") X2(ref="...") X3(ref="...") X4(ref="BAC") X5(ref="1300€-2600€") /param=glm;
model profileQ(event="5")= sexe X2 X3 X4 X5//expb clodds=wald orpvalue link=glogit ;
lsmeans sexe / means cl ilink exp oddsratio ;
```

weight weight_obs;
run;

Has anyone got an idea ?

Best

Florian

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").

```
lsmeans sex / ilink e;
```

The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.

```
output out=preds predprobs=individual;
```

and then average them

```
proc means data=preds mean; class sex; var ip:; run;
```

The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.

```
data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;
```

then add SCORE statements to apply the fitted model to each of these data sets.

```
score data=m out=mpreds;
score data=w out=wpreds;
```

and then average each

```
proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;
```

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think that's a substantively different question, and for that you need discriminant function analysis.

However, I would also say, there's not really any benefit to doing discriminant function analysis over logistic regression. Logistic regression is easier to understand, and the information is essentially mathematically identical.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have a hard time trying to get around something that shows as percentage in the picture of output that you want to "adjust a **frequency**" (what ever that may actually mean). The values shown in highlight are RATES, not counts or frequencies.

If you want to adjust a rate what adjustment do you want to apply?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for your answers.

Actually, what I'm trying to calculate is an adjusted rate. Sorry for the confusion.

I want to calculate these adjusted rates because the interpretation of results of multinomial logistic regressions (i.e., odds ratio) is never very obvious (you must have the characteristics of the reference class in mind for this).

Regarding the rates, I want to adjust them on the other independent variables (for example, sex would be adjusted on X1, X2, etc.).

Florian

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ah, you're overthinking it.

Just take your code, choose a predictor of interest, PREDICTOR1, remove it as a predictor in the equation, and rerun the model with

WHERE PREDICTOR1 = 1;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In general when you think "adjusted" predictions, you are talking about either LS-means, as available from the LSMEANS statement, or predictive margins, as from the Margins macro. For LS-means, you can add the LSMEANS statement in your PROC LOGISTIC step. Note that least squares means are simply linear combinations of the model parameters. The adjustment is in the coefficients used on the predictors other than the one of interest. You can see these coefficients by adding the E option, and you can adjust these, if needed, with the OM= option. The ILINK option applies the inverse of the logit link to get predicted probabilities (what you are calling a "rate").

```
lsmeans sex / ilink e;
```

The problem with the LSMEANS statement is that it can only provide adjusted predicted probabilities for the first k-1 levels of a response with k levels (since there are only k-1 logits). An obvious way to get adjusted estimates for all response levels is to simply average the predicted probabilities from the model for each response level. You can do that by adding an OUTPUT statement to save the predicted probabilities for all observations.

```
output out=preds predprobs=individual;
```

and then average them

```
proc means data=preds mean; class sex; var ip:; run;
```

The other possibility is predictive margins. Unfortunately, the Margins macro cannot be used with a multinomial response model. However, you can easily compute point estimates of the predictive margins since they are simply averages of predicted probabilities when all observations are fixed at one level of the predictor. You can do that by creating versions of your data with all observations set to Men or Women.

```
data m; set taxes.analyse; sex='Men'; run;
data w; set taxes.analyse; sex='Women'; run;
```

then add SCORE statements to apply the fitted model to each of these data sets.

```
score data=m out=mpreds;
score data=w out=wpreds;
```

and then average each

```
proc means data=mpreds mean; var p:; run;
proc means data=wpreds mean; var p:; run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you all for your answers. It's very clear and helpfull !

Florian

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.