BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hi,

When using proc glm or proc mixed with lsmeans statement, is there a way to obtain standard deviation (SD) for each lsmean values instead of standard erro (SE)?

Technically, given SE, I am able to calculate SD. But for whatever the reason, I am getting identical SE for all lsmean values for all of the groups being compared. Thus, the use of SE would result in identical SD for all groups, which does not make sense. (Based on the raw data, I know that different groups have different standard deviation.)

Is there a way to properly obtain SD for lsmean values? If yes, please explain how. If no, why not??

Thanks in advance.
1 ACCEPTED SOLUTION

Accepted Solutions
Dale
Pyrite | Level 9
In order to have identical SEs for all lsmeans, you must have a balanced design (equal number of observations for each group and identical distribution of any other covariates that appear in your model). Along with the balanced design, you must be assuming that the residual variance is the same for all observations.

The assumption that the residual variance is the same for all observations is a standard assumption. However, it sounds like you don't like that assumption of a common residual variance. Employing the MIXED procedure, you can relax this assumption.

Suppose that you have fitted a model such as that shown below:

proc mixed data=mydata;
  class mygroupvar ;
  model y = mygroupvar / s;
  lsmeans mygroupvar;
run;

This model assumes that the residual variance is the same for every observation. That means that the residual variance for an observation from mygroupvar=1 is the same as the residual variance for an observation from mygroupvar=2 (is the same as the residual variance for an observation from mygroupvar=3 ...). But if you believe that the residual variance is different for observations having different levels of mygroupvar, then the above model is not correct. Instead, you should be fitting the model:

proc mixed data=mydata;
  class mygroupvar ;
  model y = mygroupvar / s ddfm=satterthwaite;
  repeated / group=mygroupvar;
  lsmeans mygroupvar;
run;

Note that we have added a REPEATED statement in which we employ the group= option to specify that the residual variance differs for observations which have different values of the variable mygroupvar. The lsmean values for each level of mygroupvar will now have different s.e. because they have different residual variance.

In addition to inclusion of the REPEATED statement with the group= option, we also add ddfm=satterthwaite to the MODEL statement. Note that if you had just two groups and no covariates, then you have the classic Behrens-Fisher problem. The t-test procedure uses a Satterthwaite computation for the degrees of freedom when testing for a difference of means when the assumption of constant variance is violated. We do exactly the same with the MIXED procedure. However, with the MIXED procedure, we can generalize from a simple design with just two groups and no covariates to a design with multiple group levels as well as covariates.

So, forget trying to get at a SD value for each of your lsmeans. Alter your model to properly deal with the assumption that the residual variance differs across groups.

View solution in original post

3 REPLIES 3
Dale
Pyrite | Level 9
In order to have identical SEs for all lsmeans, you must have a balanced design (equal number of observations for each group and identical distribution of any other covariates that appear in your model). Along with the balanced design, you must be assuming that the residual variance is the same for all observations.

The assumption that the residual variance is the same for all observations is a standard assumption. However, it sounds like you don't like that assumption of a common residual variance. Employing the MIXED procedure, you can relax this assumption.

Suppose that you have fitted a model such as that shown below:

proc mixed data=mydata;
  class mygroupvar ;
  model y = mygroupvar / s;
  lsmeans mygroupvar;
run;

This model assumes that the residual variance is the same for every observation. That means that the residual variance for an observation from mygroupvar=1 is the same as the residual variance for an observation from mygroupvar=2 (is the same as the residual variance for an observation from mygroupvar=3 ...). But if you believe that the residual variance is different for observations having different levels of mygroupvar, then the above model is not correct. Instead, you should be fitting the model:

proc mixed data=mydata;
  class mygroupvar ;
  model y = mygroupvar / s ddfm=satterthwaite;
  repeated / group=mygroupvar;
  lsmeans mygroupvar;
run;

Note that we have added a REPEATED statement in which we employ the group= option to specify that the residual variance differs for observations which have different values of the variable mygroupvar. The lsmean values for each level of mygroupvar will now have different s.e. because they have different residual variance.

In addition to inclusion of the REPEATED statement with the group= option, we also add ddfm=satterthwaite to the MODEL statement. Note that if you had just two groups and no covariates, then you have the classic Behrens-Fisher problem. The t-test procedure uses a Satterthwaite computation for the degrees of freedom when testing for a difference of means when the assumption of constant variance is violated. We do exactly the same with the MIXED procedure. However, with the MIXED procedure, we can generalize from a simple design with just two groups and no covariates to a design with multiple group levels as well as covariates.

So, forget trying to get at a SD value for each of your lsmeans. Alter your model to properly deal with the assumption that the residual variance differs across groups.
deleted_user
Not applicable
Hi Dale, I really appreciate your detailed reply! Thanks for your kindness.

For my analysis, prior to running SAS, I checked my data for ANOVA assumptions, and I know that the groups being compared have variances that are not significantly different (but of course, not identical). What I didn’t realize was that I was assuming variances to be ‘identical’ across groups by coding as your first example:

proc mixed data=mydata;
class mygroupvar ;
model y = mygroupvar / s;
lsmeans mygroupvar;
run;

Although I ensured that group variances are not significantly different, since they are not identical, maybe it is not exactly correct for me to be coding as above.

Further, if the addition of a REPEATED statement allows me to compare groups with different variances, am I correct to be thinking that, “I’m going to use REPEATED statement always, and I’m not going to bother with ensuring variance homogeneity among groups”. Or, are there certain conditions that have to be present in order to use REPEATED statement correctly? Thanks again Dale, Best regards,
Dale
Pyrite | Level 9
It is a standard assumption of regression analysis to fit a model characterized as follows:

Y{i} = X{i}*beta + epsilon{i}

        epsilon{i} ~ iid N(0, sigma^2)

That is, we usually do assume that the residuals are independently and identically distributed and have a common variance. Of course, if you examine the residuals in any finite set, you will almost surely find that there is some difference in observed residual variance for any two or more subsets of the collected data.

If the standard assumptions are correct (or at least, a good approximation of the truth), then you are usually best off employing the standard assumptions. If you complicate the model by introducing unnecessary terms into the variance structure, you reduce the power of your model.

So, no, you do not want to start incorporating a REPEATED statement in every analysis which you run. Moreover, it sounds to me (without having been presented any information about experimental design and the structure of the data which you have collected) that you are best off accepting that the underlying model asserts that the standard errors of the lsmeans are the same for all groups - even if empirical residual variances might differ across your groups.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 12352 views
  • 1 like
  • 2 in conversation