Why do logistic regression results differ among procedures?

buhl2752 · Posted 04-19-2016 12:59 PM

I was running a very simplified logistic regression using Proc Logistic, Proc Genmod, and Proc Glimmix and was surprised to find that my conclusion could change depending on the procedure I used. My sample data set and code are as follows:

data test;

input y n x @@;

cards;

6 8 1 4 7 2 4 8 3

3 9 4 3 7 5 1 9 6

;

proc logistic data=test; model y/n=x; run;

proc genmod data=test; model y/n=x / dist=binomial link=logit type3; run;

proc glimmix data=test; model y/n=x / dist=binomial link=logit s chisq; run;

I get the same parameter estimates and standard errors from all three procedures. However, the p-values are different for Proc Glimmix. They match between Logistic and Genmod (however, the type3 results in Genmod do not match the test for the paramter estiamte - not sure why). It seems like almost everything matches between Glimmix and the other two procedures except for the p-values. I can get the same p-values if I add the Chisq option, but why are the p-values based on the F different? My conclusions could change depending on which procedure I am using. Therefore, if I need to use Glimmix because I have random effects, I am really concerned that the results I get will not be correct. Any insight as to why these are different would be helpful. (I also tried this with more comples datasets and again Glimmix differed from the other two procedure with respect to the p-values.) Thank you.

lvm · Posted 04-19-2016 03:26 PM

The procedures are using different test methods, or different test statistics, as the defaults. GENMOD is using a likelihood ratio (LR) test as the default. This can be changed to a Wald test (i.e., chi-squared test statistic) with the WALD option. A chi-squared statistic is used for LR and WALD, but these are based on diffferent values. The Wald test in GENMOD is not adjusted for small sample size; that is, the WALD chi-squared is the same as an F test with infinite denominator df. LOGISTIC gives LR and Wald test statistics, but once again, is not adjusted for small (finite) sample sizes (Wald chi-squared = F with infinite denominator df). GLIMMIX uses the scaled Wald statistic (F statistic with finite denominator df) as the default. One can get the regular chi-squared (equivalent to F with inifinite denominator df), which is still a Wald test, with the chisq option that you showed. GLIMMIX does not have a LR option for testing fixed effects.

Moreover, if you add random effects to GLIMMIX, the estimation method would change from MLE to pseudo-likelihood (by default). You can get back to an approxmiate direct MLE by using method=laplace.

Another confusing thing to wathc out for: in GENMOD, the default type 3 test is LR, but one still gets WALD-based SEs and CIs for the parameter estimates in the Solution table. As noted above, one can always switch to Wald type 3 tests (but still no correction for finite sample size).

Why do logistic regression results differ among procedures?

Re: Why do logistic regression results differ among procedures?

Catch up on SAS Innovate 2026