When I run ANOVA by proc mixed (or proc glm) with lsmeans option to compare means of groups (e.g., each group containing n=12 samples = balanced data), I get identical standard errors (SE) for each mean value.
Based on manual calculation with Excel, I know that the standard deviation (SD) of each group is quite different, and also I remember: SE = SD/(sqrt of sample size n).
If SD are different, why does lsmeans option give identical SE for all of the group means?
I believe you need to study up on some of the basic assumptions of analysis of variance. In particular, the assumption of homogeneity of variance. If you start out assuming that all groups have an equal variance, it is not surprising that the best linear unbiased estimates (the LSMEANS) all have the same estimate of variability (standard error). Further, the calculation of the standard error of any estimate or differences in estimates is based on a single, pooled value--the mean squared error in the case of proc glm, and the combined quadratic form (see the documentation) in proc mixed.
Hi Steve, thanks so much for your reply, your comment really helped to shed light on the source(s) of my confusion.
I thought the reason for testing ‘variance homogeneity’ was to ensure that all groups being compared have comparable spread in data points. Also, I thought ‘standard errors’ provide information on how ‘precise’ a given group mean is (i.e., if multiple samples were repeatedly drawn from the same population, about two thirds of these samples would be expected to have mean values between one SE above and below the estimated mean).
Given such compartmentalized understanding of these concepts, it is difficult to fully comprehend your comment on how …it is not surprising to see identical standard errors if the variances are equal... (I'll study up on it. Meanwhile, I welcome any help for me to connect the dots)
Also, I should study up on the method of value pooling and how SAS calculates lsmeans and standard errors.
For now, my question from the original post has evolved into:
Before I conduct ANOVA I check my data for ANOVA assumptions, and I know that the groups being compared have variances that are not significantly different (but of course, not identical). If the variances are not identical, why should the standard errors for lsmeans of different groups have identical standard errors?
Lsmeans are solutions to a series of simultaneous equations constructed so that they are the best linear unbiased estimators of central tendency. They consider the whole of the data collected, and adjust for imbalance in numbers between groups. Since we are dealing with a system, it then becomes easy (under ordinary least squares estimation) to derive a single estimate of variability (MSE) that applies to each and every lsmean. This gives rise to the standard error of the mean. Also from this estimate of variability, we can solve for the standard error of the difference between two means, and perform statistical comparisons. This is why standard errors are important--they tell us about the possible distribution of the parameter being estimated, whereas standard deviations tell us about the distribution of the values that were measured. This is a subtle but critical difference.
Proc mixed moves beyond proc glm to use likelihood based estimation, so that we can accommodate structural assumptions about the error variances and covariances. And in fact, you can model the error in such a way that heterogeneous variances can be accommodated. But you still do NOT get standard deviations. The variability estimates are standard ERRORs of the parameters that provide an optimal fit to the data, given the model and error structure chosen.