Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Why do logistic regression results differ among procedures?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-19-2016 12:59 PM
(1461 views)

I was running a very simplified logistic regression using Proc Logistic, Proc Genmod, and Proc Glimmix and was surprised to find that my conclusion could change depending on the procedure I used. My sample data set and code are as follows:

data test;

input y n x @@;

cards;

6 8 1 4 7 2 4 8 3

3 9 4 3 7 5 1 9 6

;

proc logistic data=test; model y/n=x; run;

proc genmod data=test; model y/n=x / dist=binomial link=logit type3; run;

proc glimmix data=test; model y/n=x / dist=binomial link=logit s chisq; run;

I get the same parameter estimates and standard errors from all three procedures. However, the p-values are different for Proc Glimmix. They match between Logistic and Genmod (however, the type3 results in Genmod do not match the test for the paramter estiamte - not sure why). It seems like almost everything matches between Glimmix and the other two procedures except for the p-values. I can get the same p-values if I add the Chisq option, but why are the p-values based on the F different? My conclusions could change depending on which procedure I am using. Therefore, if I need to use Glimmix because I have random effects, I am really concerned that the results I get will not be correct. Any insight as to why these are different would be helpful. (I also tried this with more comples datasets and again Glimmix differed from the other two procedure with respect to the p-values.) Thank you.

1 REPLY 1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The procedures are using different test methods, or different test statistics, as the defaults. GENMOD is using a likelihood ratio (LR) test as the default. This can be changed to a Wald test (i.e., chi-squared test statistic) with the WALD option. A chi-squared statistic is used for LR and WALD, but these are based on diffferent values. The Wald test in GENMOD is not adjusted for small sample size; that is, the WALD chi-squared is the same as an F test with infinite denominator df. LOGISTIC gives LR and Wald test statistics, but once again, is not adjusted for small (finite) sample sizes (Wald chi-squared = F with infinite denominator df). GLIMMIX uses the scaled Wald statistic (F statistic with **finite** denominator df) as the default. One can get the regular chi-squared (equivalent to F with inifinite denominator df), which is still a Wald test, with the chisq option that you showed. GLIMMIX does not have a LR option for testing fixed effects.

Moreover, if you add random effects to GLIMMIX, the estimation method would change from MLE to pseudo-likelihood (by default). You can get back to an approxmiate direct MLE by using method=laplace.

Another confusing thing to wathc out for: in GENMOD, the default type 3 test is LR, but one still gets WALD-based SEs and CIs for the parameter estimates in the Solution table. As noted above, one can always switch to Wald type 3 tests (but still no correction for finite sample size).

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.