BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
beahern
Calcite | Level 5

I am trying to figure out what is an appropriate standardized effect size measure for parameter estimates in a PROC GENMOD model run with the following syntax:

 

PROC GENMOD DATA=mydata;
model adjusted_hits = age expertise sequence ageseq;
run;

 

adjusted hits values: -1, -0.5, 0, 0.5, 1

age: centered at median

expertise: moderate=-0.5, high=0.5

sequence: easy=-0.5, hard=0.5

ageseq: interaction of age and sequence

 

How are model effect sizes calculated?

 

Thank you very much for your help.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

After doing a fair amount of digging through CrossValidated for effect size in generalized linear models, I came back to see what distribution and link you are using.  It appears from your code that you are using the default identity link and normal distribution.  If that is the case, try running PROC GLM like this:

 

PROC GLM DATA=mydata;
model adjusted_hits = age expertise sequence ageseq / effectsize;
quit; run;

If I am incorrect on the link and distribution assumptions, this becomes much more difficult, as the usual components of effect size estimates (sums of squares, for instance) are not applicable to likelihood based methods.  There are two approaches that I came across that may be helpful.  The first is a likelihood ratio test.  For PROC GENMOD you can obtain this by specifying Type3 as an option in the MODEL statement.  Chi squared values are obtained for each factor based on the change in likelihood compared to the full model. The second is to look at the change in likelihood due to a predetermined change in magnitude of the variable in question.  This would be a multi-step process (as I can't figure out an easy way to get partial likelihood values).  Step 1: Fit the full model.  Step 2: Fit the model with the variable in question fixed at one value.  You can do this in a DATA step.  Save the log likelihood from this run. Step 3 Fit the model with the variable in question fixed at a second value, different from the one in Step 2.  Save the log likelihood from this run. Step 4. Create the ratio of change in log likelihood divided by change in parameter value.  You can then rank the variables as to their impact on the fit to your data.  There is a fair amount of R code floating around that has packages to do this sort of thing.

 

SteveDenham

 

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

After doing a fair amount of digging through CrossValidated for effect size in generalized linear models, I came back to see what distribution and link you are using.  It appears from your code that you are using the default identity link and normal distribution.  If that is the case, try running PROC GLM like this:

 

PROC GLM DATA=mydata;
model adjusted_hits = age expertise sequence ageseq / effectsize;
quit; run;

If I am incorrect on the link and distribution assumptions, this becomes much more difficult, as the usual components of effect size estimates (sums of squares, for instance) are not applicable to likelihood based methods.  There are two approaches that I came across that may be helpful.  The first is a likelihood ratio test.  For PROC GENMOD you can obtain this by specifying Type3 as an option in the MODEL statement.  Chi squared values are obtained for each factor based on the change in likelihood compared to the full model. The second is to look at the change in likelihood due to a predetermined change in magnitude of the variable in question.  This would be a multi-step process (as I can't figure out an easy way to get partial likelihood values).  Step 1: Fit the full model.  Step 2: Fit the model with the variable in question fixed at one value.  You can do this in a DATA step.  Save the log likelihood from this run. Step 3 Fit the model with the variable in question fixed at a second value, different from the one in Step 2.  Save the log likelihood from this run. Step 4. Create the ratio of change in log likelihood divided by change in parameter value.  You can then rank the variables as to their impact on the fit to your data.  There is a fair amount of R code floating around that has packages to do this sort of thing.

 

SteveDenham

 

pink_poodle
Barite | Level 11

@SteveDenham ,

If I use a log function to normalize the distribution of my outcome, can I look at effect size the same way (using GLM model statement's effectsize option) on the normalized outcome?

Many thanks!

SteveDenham
Jade | Level 19

The answer is yes, with a big qualification.  Taking the log of a value and analyzing it via GLM is NOT the same as using a log link in GENMOD.  The first is really assuming a lognormal distribution, while the latter assumes a distribution from the exponential family where the variance is some function of the expected value.  A good way to see this is to run these 3 on the same data: GLM on log transformed data, GENMOD specifying a log-normal distribution and GENMOD with a log link.  The results may surprise you (they may not if I am saying something obvious to you).

 

So effect size measures like eta squared and omega squared obtained from GLM on the log transfomed data assume homogeneous variances in the log transformed space.  Thus ranking variables by effect size would tell you the relative importance of a variable that is predictive of  what is essentially log(Y) + error(Normal, 0, sigma**2).  But if your variances are not homogeneous in the log transformed space (like Poisson, negative binomial, or gamma distributed variables), the family of effect sizes from GLM will likely rank variables differently.  To get around this you might try the approach given at this link on StatExchange.  I interpret this as: hold all of the variables at fixed values (with their solution as previously found) except one.  That one you should score the model with two values and get the difference in the predicted value.  Repeat as needed until all of the variables have an estimate. I believe this is what the R package emmeans uses for nlme results, but I could easily be wrong.

 

SteveDenham

pink_poodle
Barite | Level 11

@SteveDenham ,

Thank you very much for a great explanation!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1517 views
  • 3 likes
  • 3 in conversation