Solved: Re: Welch's ANOVA With Geometric Means

mariko5797 · Posted 08-04-2021 01:43 PM

I am not super familiar with Welch's ANOVA nor geometric means, so a detailed response is appreciated.

I am interested in comparing concentrations among BMI groups (not obese, obese, severely obese) within two different time points (after start of treatment, after end of treatment). Since we are assuming unequal variances, I thought Welch's ANOVA would work well to check if at least one group is different. However, since geometric mean and geometric cv are what's being reported, I thought I should compare geometric means rather than arithmetic means.

(1) Is there a way to compare geometric means using ANOVA? I know the geometric mean is the exponentiated mean of the log-transformed data, but I am unsure how to apply that to ANOVA.

(2) Is it even necessary to use the geometric mean if I use Welch's ANOVA?

(3) Is there a simple way to store the p-value from the output table, so I can add it to a summary table later on? My plan was to use PROC SQL to add a row to the bottom of my "Data Have" with a p-value.

Generally, I want a table like this (will use PROC REPORT):

My datasets are organized like this:

xx represents some numerical value.

Thank you in advance!

SteveDenham · Posted 08-05-2021 01:26 PM

Consider this code:

proc glimmix data=have;
ods output lsmeans=lsmeans;
class bmi time;
model concentration = bmi time bmi*time/dist=lognormal ddfm=satterthwaite;
random _residual_/group=bmi;
lsmeans bmi time bmi*time/cl;
run;

If you have the first and second observation for each subject, you could change this to:

proc glimmix data=have;
ods output lsmeans=lsmeans;
class bmi time sub_ID;
model concentration = bmi time bmi*time/dist=lognormal ddfm=satterthwait;
random _residual_/group=bmi subject=sub_ID;
lsmeans bmi time bmi*time/cl;
run;

This requires a "long" version of the data, where each line has one subject_ID, one BMI and one time. You could add in a test for homogeneity of variance, but these two blocks are closer to the assumption in a Welch's t test of unequal variances. If the variances are grossly different by group, that Satterthwaite approximation for the denominator degrees of freedom is appropriate.

So this results in a dataset that has the estimates and the 95% confidence bounds on the natural log scale. To get geometric means and bounds, simply exponentiate these values in a DATA step.

SteveDenham

View solution in original post

sbxkoenk · Posted 08-04-2021 05:59 PM

Hello,

This post may be of interest to you:

ANOVA using geometric mean

https://communities.sas.com/t5/Statistical-Procedures/ANOVA-using-geometric-mean/m-p/303654#M16143

Koen

SteveDenham · Posted 08-05-2021 11:30 AM

So long as all your values are greater than zero, you can log transform and do Welch's ANOVA to test if the log(means) are different. Since the backtransfomation is monotonic, you are getting p values for differences between the geometric means.

However, I would encourage you to make a leap ahead from the 1950's (when Welch presented his method) to something that uses a generalized linear model, and allows you to model the heteroscedasticity (PROC GLIMMIX using the GROUP= option in the RANDOM statement, and a lognormal distribution).

SteveDenham

mariko5797 · Posted 08-05-2021 12:11 PM

Thank you.

I am not familiar with PROC GLIMMIX and will have to look into it. You mentioned to use the GROUP = option in the RANDOM statement. Would I be making my BMI groups random? I've had limited experience working with mixed effects models in school, so I'm not great at assessing what should be treated as fixed vs. random.

SteveDenham · Posted 08-05-2021 01:26 PM

Consider this code:

proc glimmix data=have;
ods output lsmeans=lsmeans;
class bmi time;
model concentration = bmi time bmi*time/dist=lognormal ddfm=satterthwaite;
random _residual_/group=bmi;
lsmeans bmi time bmi*time/cl;
run;

If you have the first and second observation for each subject, you could change this to:

proc glimmix data=have;
ods output lsmeans=lsmeans;
class bmi time sub_ID;
model concentration = bmi time bmi*time/dist=lognormal ddfm=satterthwait;
random _residual_/group=bmi subject=sub_ID;
lsmeans bmi time bmi*time/cl;
run;

This requires a "long" version of the data, where each line has one subject_ID, one BMI and one time. You could add in a test for homogeneity of variance, but these two blocks are closer to the assumption in a Welch's t test of unequal variances. If the variances are grossly different by group, that Satterthwaite approximation for the denominator degrees of freedom is appropriate.

So this results in a dataset that has the estimates and the 95% confidence bounds on the natural log scale. To get geometric means and bounds, simply exponentiate these values in a DATA step.

SteveDenham

mariko5797 · Posted 08-05-2021 03:53 PM

If I only care that at least one BMI group is different, I assume I can go off of the Type III Tests of Fixed Effects table.

How would you describe this type of test? That is, before I had the footnote on my table "BMI groups were compared using Welch's ANOVA." Since this is no longer Welch's ANOVA, what would the test be called?

SteveDenham · Posted 08-06-2021 08:20 AM

ANOVA with nonhomogeneous errors.

Might need to spell out ANOVA as analysis of variance.

SteveDenham

SAS Innovate 2025: Save the Date