I am a relatively new SAS user, who is in serious need of your help (as I cannot afford to loose any more sleep)…. I think very simple explanations exist to my questions, and anyone who has used PROC GLM or PROC MIXED may be able to answer my question. However I cannot seem to figure it out myself OR cannot find any useful information on the Internet. Hope someone can help me out. It’s a bit long, but your help is greatly appreciated!
To give you just enough background, I have conducted ANOVA and compared means of (data values collected from) 4 different sites that received identical treatment. The objective was to determine how different sites respond (in terms of soil nutrient levels) to a given set of identical treatments (e.g. certain plant species cultivated at certain number of plants per unit area etc.). At each site, three identical plots were set up as replications, which also acted as blocks to account for special variability of soil nutrients.
Just to try out, I have used both PROC GLM and PROC MIXED with appropriate coding containing LSMEANS (least squares means) statement, which ran nicely. However, the problem began upon closer examination of the results:
Despite the fact that the means (i.e., lsmean values = ‘Estimates’ in the output) for each site varied considerably, standard errors (SE) were identical for all of the sites, and I don’t understand why this is the case.
Going by what I learned in my statistics class, SE is calculated as:
SE = Standard deviation / (square root of sample size)
For my datasets, sample sizes are identical for all sites and I know for a fact that standard deviation (SD) of these sites are quite different from each other (based on my manual calculations with Excel).
My question is, why should SEs for all sites be identical? What am I missing here?
At the end, I would like to produce a simple table listing the means of each site, along with the results from post-hoc multiple comparison test (such as Tukey) to indicate to presence/absence of significant differences between sites.
Along with the mean values (i.e. lsmean values), I would like to indicate the spread of data points at each site. I don’t wish to get into the discussion of whether to report SE or SD with the means here; however, I simply wish to know the Standard DEVIATION of lsmean values, which are not provided in the PROC GLM or PROC MIXED output.
My question is, is there a way to obtain Standard DEVIATION for LSMEAN values with PROC GLM or PROC MIXED? If the answer is yes, how? If the answer is no, why not?
(Technically, I am able to calculate SD from SE as previously mentioned. But since SEs for all sites are identical (Problem 1), it will only give me identical SDs for all sites, which is senseless. Thus I’m looking for a way to obtain SD other than calculating it from SE.)
I also tried running PROC GLM and PROC MIXED without the random ‘blocking’ effect (i.e., samples from three blocks pooled together for each site), but SE remained identical for all sites (Problem 1), and I was not able to obtain SD for lsmean values (Problem 2).
I understand that if the data is always balanced, I have the option to use MEANS (instead of LSMEANS), but some nutrients have extreme outliers, and I need to be able to deal with unbalanced data also. Thus I prefer to use LSMEANS.
Please help me by providing answers to above two problems! It is totally possible that I lack basic understanding of LSMEANS, and missing the point all together. If that is the case, please kindly explain what I am missing.