BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chaog
Calcite | Level 5

I would like to fit a Tweedie GLM to the data. However, even though PROC GENMOD from SAS and glm() from R gives the same coefficient estimates, they give quite different standard error. What it the reason behind this?

 

If we run the following R code,

 

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- as.factor(mydata$rank)
mydata$rank <- relevel(mydata$rank, ref = "1")

mytweedie <- glm(admit ~ gre + gpa + rank, data = mydata,
+     family = statmod::tweedie(var.power = 1.5, link.power = 0))
summary(mytweedie)

R shows that

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.4957360  0.7906423  -4.421 1.27e-05 ***
gre          0.0016542  0.0007611   2.173 0.030345 *  
gpa          0.5274319  0.2307913   2.285 0.022825 *  
rank2       -0.3265264  0.2167527  -1.506 0.132754    
rank3       -0.7947102  0.2364447  -3.361 0.000852 ***
rank4       -0.9818197  0.2867970  -3.423 0.000683 ***

However, if we run the same model using the same data on SAS with PROC GENMOD,

PROC GENMOD data=logistic_test DESCENDING;
    CLASS rank(REF = "1");
    MODEL admit = gre gpa rank / DIST=tweedie(p=1.5);
RUN;

The result is

Parameter . DF    Estimate   Standard Error
Intercept     1       -3.4958     1.1387
gre               1        0.0017      0.0011
gpa              1       0.5274      0.3176
......
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses. 

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

It is probably a scale issue. Notice that the ratio of (R StdErr) / (SAS StdErr) is approximately the constant value 1.4(ish). You don't show all the output, but presumably this is the estimate for the Tweedie dispersion (scale) value. Multiply your R standard errors by the dispersion estimate to get the SAS values (or divide the SAS estimates by the scale).

 

 

chaog
Calcite | Level 5
I read into the document and apparently PROC GENMOD use ML to estimate the scale. My question is, what is scale? I am familiar with the notation of "dispersion" and apparently, "scale" and "dispersion" are different in SAS. The following document states that "SAS works with scale" but doesn't give the definition of "scale":

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_genmod_sect...
Rick_SAS
SAS Super FREQ

The documentation page has a table (the headers are "Distribution" and "Scale") which shows how the dispersion parameter in the exponential family relates to the scale parameter for various distributions. A "scale parameter" is merely the parameter in the distribution that controls the variance, such as sigma for Normal and lambda for Poisson. The density equations at the top of the page

express the variance in terms of the scale parameters, not the dispersion. The "estimate of scale" in the parameter estimates table enables you to go back and forth between the two parameters.

StatDave
SAS Super FREQ

It is not immediately clear how R's glm function estimates the Tweedie scale parameter, but it reports a scale estimate of 1.381162. This is close to the estimate you get if you use the SCALE=D option in the MODEL statement of PROC GENMOD. If you specify SCALE=1.381162 NOSCALE options in the MODEL statement in PROC GENMOD, then you get very similar standard errors. Note that GENMOD does not require you to specify the Tweedie power parameter - by default it estimates it and the scale parameter using maximum likelihood. If you just specify DIST=TWEEDIE as the only option, you get a model with log likelihood = -237.8 as compared to the log likelihood of the power=1.5 model which is -424.9. Also, you should omit the DESCENDING option - this is only used for binary responses. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2964 views
  • 0 likes
  • 4 in conversation