Solved: Why do PROC GENMOD and glm() in R show different standard errors?

chaog · Posted 04-01-2019 07:26 PM

I would like to fit a Tweedie GLM to the data. However, even though PROC GENMOD from SAS and glm() from R gives the same coefficient estimates, they give quite different standard error. What it the reason behind this?

If we run the following R code,

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- as.factor(mydata$rank)
mydata$rank <- relevel(mydata$rank, ref = "1")

mytweedie <- glm(admit ~ gre + gpa + rank, data = mydata,
+     family = statmod::tweedie(var.power = 1.5, link.power = 0))
summary(mytweedie)

R shows that

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.4957360  0.7906423  -4.421 1.27e-05 ***
gre          0.0016542  0.0007611   2.173 0.030345 *  
gpa          0.5274319  0.2307913   2.285 0.022825 *  
rank2       -0.3265264  0.2167527  -1.506 0.132754    
rank3       -0.7947102  0.2364447  -3.361 0.000852 ***
rank4       -0.9818197  0.2867970  -3.423 0.000683 ***

However, if we run the same model using the same data on SAS with PROC GENMOD,

PROC GENMOD data=logistic_test DESCENDING;
    CLASS rank(REF = "1");
    MODEL admit = gre gpa rank / DIST=tweedie(p=1.5);
RUN;

The result is

Parameter . DF    Estimate   Standard Error
Intercept     1       -3.4958     1.1387
gre               1        0.0017      0.0011
gpa              1       0.5274      0.3176
......

StatDave · Posted 04-02-2019 11:18 AM

It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses.

View solution in original post

Kurt_Bremser · Posted 04-02-2019 03:59 AM

Which R module/library are you using? Is it certified by a governing body like the FDA, so you can trust it?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Rick_SAS · Posted 04-02-2019 10:20 AM

It is probably a scale issue. Notice that the ratio of (R StdErr) / (SAS StdErr) is approximately the constant value 1.4(ish). You don't show all the output, but presumably this is the estimate for the Tweedie dispersion (scale) value. Multiply your R standard errors by the dispersion estimate to get the SAS values (or divide the SAS estimates by the scale).

chaog · Posted 04-02-2019 07:58 PM

I read into the document and apparently PROC GENMOD use ML to estimate the scale. My question is, what is scale? I am familiar with the notation of "dispersion" and apparently, "scale" and "dispersion" are different in SAS. The following document states that "SAS works with scale" but doesn't give the definition of "scale":

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_genmod_sect...

Rick_SAS · Posted 04-02-2019 08:32 PM

The documentation page has a table (the headers are "Distribution" and "Scale") which shows how the dispersion parameter in the exponential family relates to the scale parameter for various distributions. A "scale parameter" is merely the parameter in the distribution that controls the variance, such as sigma for Normal and lambda for Poisson. The density equations at the top of the page

express the variance in terms of the scale parameters, not the dispersion. The "estimate of scale" in the parameter estimates table enables you to go back and forth between the two parameters.

StatDave · Posted 04-02-2019 11:18 AM

It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses.

Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

Re: Why do PROC GENMOD and glm() in R show different standard errors?

SAS Innovate 2025: Call for Content