turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- how to use SAS to generate predicted probabilities...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-11-2018 05:25 PM

Hi,

I have a question what is the correct way to calculate the predicted probabilities according to predictor levels in logistic regression using SAS.

The logistic regression model is as below:

outcome: success (binary, yes or no)

predictor: education level (binary, under or graduate)

control variables: age (age group) and gender

my SAS code:

(1) using logistic model to export the predicted probabilities of all observations on events="Yes"

proc logistic data=data;

class age gender;

model success(event="Yes")=age gender edu;

output out=pred p=p;

run;

(2) calculate the lsmeans of predicted probabilities for predictor using exported data

proc genmod data=pred;

class age gender;

model p=age gender edu;

lsmeans edu;

run;

In my opinion, in this way I can get the average predicted probabilities of each predictor level (under or graduate) after holding age and gender as constant.

But, I heard it is better to calculate predicted probabilities in STATA using the “marginal standardization” method

The STATA command is like:

margins edu, post

I compared the results in both ways, they are different, so I am wondering which way is better?

Thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to joe66

05-12-2018 12:28 AM

Why not use lsmeans in proc logistic and compare those with stata estimates?

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-14-2018 02:50 AM

Hi PG,

I use proc genmod but not proc logistic because outcome variable "p" is continuous. I did compare the results difference between using SAS and STATA, they are different, so I am wondering which one is correct way.

Thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to joe66

05-14-2018 10:39 AM

The LSMEANS statement does not necessarily compute predictive margins which use the marginal standardization method you mention. However, in the case where you want predictive margins for one variable while holding all other predictors at their means, then I think the LSMEANS statement can be used. But note that the LSMEANS statement can only be used for a model effect that is (or is made up of) a CLASS variable, and all CLASS variables must use the non-full rank GLM parameterization. If your Age variable is grouped as you indicate, then all this can be done when you fit your model in PROC LOGISTIC. Use the ILINK option if you want the estimates at each Age level to be on the probability scale rather than the logit (log odds) scale. The E option shows you the linear combination of model parameters that the LSMEANS statement computes. Note that options are available in the LSMEANS statement (particularly OM= and BYLEVEL) to alter the coefficients that are used for the CLASS predictors. Of course, you can always use the ESTIMATE statement to compute any desired (but estimable) linear combination of the parameters.

proc logistic data=data;

class age gender / param=glm;

model success(event="Yes")=age gender edu;

lsmeans age / ilink e;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

05-17-2018 03:55 PM

Thanks for your reply!

I tried to run this code with ilink option in SAS, I can get the predicted probabilities. However, the results are about 10% different (higher) from those generated by STATA using "Margins" command.

So I felt confused which one is the correct way to calculate predicted probabilities. Any comments are welcome!

Thanks again!