I will admit from the start that this isn't really a programming question. It's an interpretation of the results/statistical concept question.
I have a data set with mean salaries of different levels of professors (full, assistant, associate) from over 1,000 US colleges/universities. I grouped all of the schools into 4 regions (Northeast, South, Midwest, West) and they're all rated on some kind of level of research facility (I, IIA, IIB). I ran a two-way ANOVA test to model the overall average of all levels of professor, based on both region and research level:
proc glm data=Prof_Sal;
class REGION COL_TYPE;
model AVE_SAL_ALL = REGION COL_TYPE REGION*COL_TYPE;
run;
The output is below. I don't understand what the Type I and Type III tables represent. If the two tables are the same, that means that there's no interaction, right? The F-statistics differ in each table but the p-values are the same but I also don't understand what that means. Please help me interpret the table!
In general, Type III is appropriate for this type of model, while Type I is not appropriate for this type of model.
REGION is statistically significant. COL_TYPE is statistically significant. The interaction is statistically significant. (All at the alpha=0.05 level)
Thanks for this information. To make sure I'm analyzing this correctly, what this output says is that the mean salary is statistically significantly different: by "region," by itself, AND by "college level," by itself. The output also says that there are statistically significant differences between mean salaries for at least one of the combinations of "region" and "college level." Since there are four regions and three college level, 3 x 4 = 12, so there are twelve different combinations.
Because there is statistically significant evidence of interaction between "region" and "college level" upon mean salary, it would be inappropriate to run one-way ANOVA for each of "region" and mean salary; and "college level" and mean salary...right? Instead, I need to run a Tukey test to see which of the twelve combinations have statistically significantly different mean salaries.
Thanks again for your help with this.
I agree with all that yoou wrote except the part where you said "I need to run a Tukey test..."
You don't "need to" run the Tukey test, it's an option, among many options, to identify the parts of the interaction that are statistically different.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.