I would like to fit a Tweedie GLM to the data. However, even though PROC GENMOD from SAS and glm() from R gives the same coefficient estimates, they give quite different standard error. What it the reason behind this?
If we run the following R code,
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") mydata$rank <- as.factor(mydata$rank) mydata$rank <- relevel(mydata$rank, ref = "1") mytweedie <- glm(admit ~ gre + gpa + rank, data = mydata, + family = statmod::tweedie(var.power = 1.5, link.power = 0)) summary(mytweedie)
R shows that
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.4957360 0.7906423 -4.421 1.27e-05 *** gre 0.0016542 0.0007611 2.173 0.030345 * gpa 0.5274319 0.2307913 2.285 0.022825 * rank2 -0.3265264 0.2167527 -1.506 0.132754 rank3 -0.7947102 0.2364447 -3.361 0.000852 *** rank4 -0.9818197 0.2867970 -3.423 0.000683 ***
However, if we run the same model using the same data on SAS with PROC GENMOD,
PROC GENMOD data=logistic_test DESCENDING;
CLASS rank(REF = "1");
MODEL admit = gre gpa rank / DIST=tweedie(p=1.5);
RUN;
The result is
Parameter . DF Estimate Standard Error Intercept 1 -3.4958 1.1387 gre 1 0.0017 0.0011 gpa 1 0.5274 0.3176 ......
It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses.
Which R module/library are you using? Is it certified by a governing body like the FDA, so you can trust it?
It is probably a scale issue. Notice that the ratio of (R StdErr) / (SAS StdErr) is approximately the constant value 1.4(ish). You don't show all the output, but presumably this is the estimate for the Tweedie dispersion (scale) value. Multiply your R standard errors by the dispersion estimate to get the SAS values (or divide the SAS estimates by the scale).
The documentation page has a table (the headers are "Distribution" and "Scale") which shows how the dispersion parameter in the exponential family relates to the scale parameter for various distributions. A "scale parameter" is merely the parameter in the distribution that controls the variance, such as sigma for Normal and lambda for Poisson. The density equations at the top of the page
express the variance in terms of the scale parameters, not the dispersion. The "estimate of scale" in the parameter estimates table enables you to go back and forth between the two parameters.
It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.