BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
joe66
Calcite | Level 5

Hi,

 

I have a question what is the correct way to calculate the predicted probabilities according to predictor levels in logistic regression using SAS.

 

The logistic regression model is as below:

 

outcome: success (binary, yes or no)

predictor: education level (binary, under or graduate)

control variables: age (age group) and gender

 

my SAS code:

 (1) using logistic model to export the predicted probabilities of all observations on events="Yes"

proc logistic data=data;

   class age gender;

   model success(event="Yes")=age gender edu;

   output out=pred p=p;

run;

 

(2) calculate the lsmeans of predicted probabilities for predictor using exported data

 

proc genmod data=pred;

   class age gender;

   model p=age gender edu;

   lsmeans edu;

run;

 

In my opinion, in this way I can get the average predicted probabilities of each predictor level (under or graduate) after holding age and gender as constant.

 

But, I heard it is better to calculate predicted probabilities in STATA using the “marginal standardization” method

 

The STATA command is like: 

 

margins edu, post

 

I compared the results in both ways, they are different, so I am wondering which way is better?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Predictive margins and LS-means are not the same in general. LS-means are linear combinations of model parameters. Margins are averages of predicted values. They are the same only when margins are computed with all other predictors fixed. Typically, only the variable for which the margins are computed is fixed and the other predictors vary as observed when the averaging is done. If you want margins rather than LS-means, then use the Margins macro.

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

Why not use lsmeans in proc logistic and compare those with stata estimates?

PG
joe66
Calcite | Level 5

Hi PG,

 

I use proc genmod but not proc logistic because outcome variable "p" is continuous. I did compare the results difference between using SAS and STATA, they are different, so I am wondering which one is correct way.

 

Thanks 

StatDave
SAS Super FREQ

The LSMEANS statement does not necessarily compute predictive margins which use the marginal standardization method you mention. However, in the case where you want predictive margins for one variable while holding all other predictors at their means, then I think the LSMEANS statement can be used. But note that the LSMEANS statement can only be used for a model effect that is (or is made up of) a CLASS variable, and all CLASS variables must use the non-full rank GLM parameterization. If your Age variable is grouped as you indicate, then all this can be done when you fit your model in PROC LOGISTIC. Use the ILINK option if you want the estimates at each Age level to be on the probability scale rather than the logit (log odds) scale. The E option shows you the linear combination of model parameters that the LSMEANS statement computes. Note that options are available in the LSMEANS statement (particularly OM= and BYLEVEL) to alter the coefficients that are used for the CLASS predictors. Of course, you can always use the ESTIMATE statement to compute any desired (but estimable) linear combination of the parameters.

 

proc logistic data=data;

   class age gender / param=glm;

   model success(event="Yes")=age gender edu;

   lsmeans age / ilink e;

run;

joe66
Calcite | Level 5

Thanks for your reply!

 

I tried to run this code with ilink option in SAS, I can get the predicted probabilities. However, the results are about 10% different (higher) from those generated by STATA using "Margins" command.

 

So I felt confused which one is the correct way to calculate predicted probabilities. Any comments are welcome!

 

Thanks again!

StatDave
SAS Super FREQ

Predictive margins and LS-means are not the same in general. LS-means are linear combinations of model parameters. Margins are averages of predicted values. They are the same only when margins are computed with all other predictors fixed. Typically, only the variable for which the margins are computed is fixed and the other predictors vary as observed when the averaging is done. If you want margins rather than LS-means, then use the Margins macro.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 14116 views
  • 0 likes
  • 3 in conversation