turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Can you get standard deviations for lsmeans, inste...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-11-2010 10:20 AM

Hi,

When using proc glm or proc mixed with lsmeans statement, is there a way to obtain standard deviation (SD) for each lsmean values instead of standard erro (SE)?

Technically, given SE, I am able to calculate SD. But for whatever the reason, I am getting identical SE for all lsmean values for all of the groups being compared. Thus, the use of SE would result in identical SD for all groups, which does not make sense. (Based on the raw data, I know that different groups have different standard deviation.)

Is there a way to properly obtain SD for lsmean values? If yes, please explain how. If no, why not??

Thanks in advance.

When using proc glm or proc mixed with lsmeans statement, is there a way to obtain standard deviation (SD) for each lsmean values instead of standard erro (SE)?

Technically, given SE, I am able to calculate SD. But for whatever the reason, I am getting identical SE for all lsmean values for all of the groups being compared. Thus, the use of SE would result in identical SD for all groups, which does not make sense. (Based on the raw data, I know that different groups have different standard deviation.)

Is there a way to properly obtain SD for lsmean values? If yes, please explain how. If no, why not??

Thanks in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2010 04:42 PM

In order to have identical SEs for all lsmeans, you must have a balanced design (equal number of observations for each group and identical distribution of any other covariates that appear in your model). Along with the balanced design, you must be assuming that the residual variance is the same for all observations.

The assumption that the residual variance is the same for all observations is a standard assumption. However, it sounds like you don't like that assumption of a common residual variance. Employing the MIXED procedure, you can relax this assumption.

Suppose that you have fitted a model such as that shown below:

proc mixed data=mydata;

class mygroupvar;

model y = mygroupvar / s;

lsmeans mygroupvar;

run;

This model assumes that the residual variance is the same for every observation. That means that the residual variance for an observation from mygroupvar=1 is the same as the residual variance for an observation from mygroupvar=2 (is the same as the residual variance for an observation from mygroupvar=3 ...). But if you believe that the residual variance is different for observations having different levels of mygroupvar, then the above model is not correct. Instead, you should be fitting the model:

proc mixed data=mydata;

class mygroupvar;

model y = mygroupvar / s ddfm=satterthwaite;

repeated / group=mygroupvar;

lsmeans mygroupvar;

run;

Note that we have added a REPEATED statement in which we employ the group= option to specify that the residual variance differs for observations which have different values of the variable mygroupvar. The lsmean values for each level of mygroupvar will now have different s.e. because they have different residual variance.

In addition to inclusion of the REPEATED statement with the group= option, we also add ddfm=satterthwaite to the MODEL statement. Note that if you had just two groups and no covariates, then you have the classic Behrens-Fisher problem. The t-test procedure uses a Satterthwaite computation for the degrees of freedom when testing for a difference of means when the assumption of constant variance is violated. We do exactly the same with the MIXED procedure. However, with the MIXED procedure, we can generalize from a simple design with just two groups and no covariates to a design with multiple group levels as well as covariates.

So, forget trying to get at a SD value for each of your lsmeans. Alter your model to properly deal with the assumption that the residual variance differs across groups.

The assumption that the residual variance is the same for all observations is a standard assumption. However, it sounds like you don't like that assumption of a common residual variance. Employing the MIXED procedure, you can relax this assumption.

Suppose that you have fitted a model such as that shown below:

proc mixed data=mydata;

class mygroupvar

model y = mygroupvar

lsmeans mygroupvar;

run;

This model assumes that the residual variance is the same for every observation. That means that the residual variance for an observation from mygroupvar=1 is the same as the residual variance for an observation from mygroupvar=2 (is the same as the residual variance for an observation from mygroupvar=3 ...). But if you believe that the residual variance is different for observations having different levels of mygroupvar, then the above model is not correct. Instead, you should be fitting the model:

proc mixed data=mydata;

class mygroupvar

model y = mygroupvar

repeated / group=mygroupvar;

lsmeans mygroupvar;

run;

Note that we have added a REPEATED statement in which we employ the group= option to specify that the residual variance differs for observations which have different values of the variable mygroupvar. The lsmean values for each level of mygroupvar will now have different s.e. because they have different residual variance.

In addition to inclusion of the REPEATED statement with the group= option, we also add ddfm=satterthwaite to the MODEL statement. Note that if you had just two groups and no covariates, then you have the classic Behrens-Fisher problem. The t-test procedure uses a Satterthwaite computation for the degrees of freedom when testing for a difference of means when the assumption of constant variance is violated. We do exactly the same with the MIXED procedure. However, with the MIXED procedure, we can generalize from a simple design with just two groups and no covariates to a design with multiple group levels as well as covariates.

So, forget trying to get at a SD value for each of your lsmeans. Alter your model to properly deal with the assumption that the residual variance differs across groups.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-18-2010 02:20 PM

Hi Dale, I really appreciate your detailed reply! Thanks for your kindness.

For my analysis, prior to running SAS, I checked my data for ANOVA assumptions, and I know that the groups being compared have variances that are not significantly different (but of course, not identical). What I didn’t realize was that I was assuming variances to be ‘identical’ across groups by coding as your first example:

proc mixed data=mydata;

class mygroupvar;

model y = mygroupvar / s;

lsmeans mygroupvar;

run;

Although I ensured that group variances are not significantly different, since they are not identical, maybe it is not exactly correct for me to be coding as above.

Further, if the addition of a REPEATED statement allows me to compare groups with different variances, am I correct to be thinking that, “I’m going to use REPEATED statement always, and I’m not going to bother with ensuring variance homogeneity among groups”. Or, are there certain conditions that have to be present in order to use REPEATED statement correctly? Thanks again Dale, Best regards,

For my analysis, prior to running SAS, I checked my data for ANOVA assumptions, and I know that the groups being compared have variances that are not significantly different (but of course, not identical). What I didn’t realize was that I was assuming variances to be ‘identical’ across groups by coding as your first example:

proc mixed data=mydata;

class mygroupvar

model y = mygroupvar

lsmeans mygroupvar;

run;

Although I ensured that group variances are not significantly different, since they are not identical, maybe it is not exactly correct for me to be coding as above.

Further, if the addition of a REPEATED statement allows me to compare groups with different variances, am I correct to be thinking that, “I’m going to use REPEATED statement always, and I’m not going to bother with ensuring variance homogeneity among groups”. Or, are there certain conditions that have to be present in order to use REPEATED statement correctly? Thanks again Dale, Best regards,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2010 06:52 PM

It is a standard assumption of regression analysis to fit a model characterized as follows:

Y{i} = X{i}*beta + epsilon{i}

epsilon{i} ~ iid N(0, sigma^2)

That is, we usually do assume that the residuals are independently and identically distributed and have a common variance. Of course, if you examine the residuals in any finite set, you will almost surely find that there is some difference in observed residual variance for any two or more subsets of the collected data.

If the standard assumptions are correct (or at least, a good approximation of the truth), then you are usually best off employing the standard assumptions. If you complicate the model by introducing unnecessary terms into the variance structure, you reduce the power of your model.

So, no, you do not want to start incorporating a REPEATED statement in every analysis which you run. Moreover, it sounds to me (without having been presented any information about experimental design and the structure of the data which you have collected) that you are best off accepting that the underlying model asserts that the standard errors of the lsmeans are the same for all groups - even if empirical residual variances might differ across your groups.

Y{i} = X{i}*beta + epsilon{i}

epsilon{i} ~ iid N(0, sigma^2)

That is, we usually do assume that the residuals are independently and identically distributed and have a common variance. Of course, if you examine the residuals in any finite set, you will almost surely find that there is some difference in observed residual variance for any two or more subsets of the collected data.

If the standard assumptions are correct (or at least, a good approximation of the truth), then you are usually best off employing the standard assumptions. If you complicate the model by introducing unnecessary terms into the variance structure, you reduce the power of your model.

So, no, you do not want to start incorporating a REPEATED statement in every analysis which you run. Moreover, it sounds to me (without having been presented any information about experimental design and the structure of the data which you have collected) that you are best off accepting that the underlying model asserts that the standard errors of the lsmeans are the same for all groups - even if empirical residual variances might differ across your groups.