Dear All,
I am using proc glm to run a simple Anova model as shown below:
proc glm data = mydata;
class IndepVar Gender;
model DepVar= IndepVar Gender/solution ss3;
lsmeans IndepVar/tdiff pdiff stderr;
run;
I have a variable lets say IndepVar with 4 levels and another Independent Variable called Gender (let's say). I also want to check if the difference of LSmeans are significant (I am not concerned about the "adjust" option here).
However, when I obtain the table for the difference of means, the p-values does not seem correct to me. For instance, in some cases higher differences which have lower higher p-value (when they should have lower and vice versa).
Because of this I am trying to check the formula that SAS use to compute the t-statistic value. I have found this webpage:
https://support.sas.com/kb/24/984.html
The above page notes that:
The TDIFF option computes a t statistic as follows for the Row i vs. Row i` difference:
t = [LSMEANi - LSMEANi`] / sqrt(MSE)/nc * sqrt( Σj1/nij+Σj1/ni`j ) ,
In the above formula, I believe that the Lsmeans values are the ones obtained from the Lsmeans table. The MSE is the model MSE and nc =4 (in my case the variable has 4 levels). However, I am stuck in the final part of the formula for the terms Σj1/nij+Σj1/ni`j. In this case, are nij and ni`j and simply the number of observations in a particular group or something else? From the example on the webpage this portion is not very clear to me. Could someone please explain this?
Thanks in advance!
If you run
proc freq;
tables IndepVar*Gender / nocum norow nocol nopercent;
run;
You will get a table that has cell counts. The cell counts are the n_{ij} to use in the formula. For example, if you are computing the difference between the 1st and 2nd levels of IndepVar, use the counts in the 1st and 2nd rows of the PROC FREQ table. If you are interested in the difference between the 3rd and 4th levels, use the counts in the 3rd and 4th rows.
If you run
proc freq;
tables IndepVar*Gender / nocum norow nocol nopercent;
run;
You will get a table that has cell counts. The cell counts are the n_{ij} to use in the formula. For example, if you are computing the difference between the 1st and 2nd levels of IndepVar, use the counts in the 1st and 2nd rows of the PROC FREQ table. If you are interested in the difference between the 3rd and 4th levels, use the counts in the 3rd and 4th rows.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.