Solved: Re: PROC TTEST versus PROC GLM - CONTRAST

AJones · Posted 05-22-2013 02:20 PM

I'm running tests on a dataset where my CLASS variable has 9 levels. Two of the levels are very similar, and I want to determine whether they are significantly different from each other to see if they can actually be separated or whether they need to be combined.

When I run PROC TTEST restricted to these two levels, they are shown to be significantly different from each other (Pr > |t| is <0.0001) and (Pr > F is <0.0001).

When I run a PROC GLM modeling the same 'var' variable for these two levels, however, the CONTRAST statement returns a significant yet different result (Pr > F = 0.0407).

Should I expect TTEST and CONTRAST to find the same significance? Is it recommended to rely on one or the other when testing for a significant difference?

Also, if I run the full model (with an additional CLASS variable), the Pr > F in the CONTRAST output for the two levels increases to 0.3231. Would you think that the levels need to be significantly different from each other in this full model, or only in the more basic model? I realize that perhaps this decision may be left to the discretion of the modeler.

SteveDenham · Posted 05-23-2013 08:28 AM

There really isn't a reason to believe that the contrast in GLM and the t test will give the same answer, as the data used are not the same. GLM uses all of the groups, and bases the contrast on the mean square error (MSE), under the assumption of homogeneity of variance. The other groups contribute to your knowledge of the estimated standard error.

Adding the additional CLASS variable removes an additional source of variation. One good example would be for a variable that had an additive gender effect. Adding gender as a class variable would reduce the variability estimate, but also would remove a major difference between levels of the groups.

In the end, the model ought to reflect the design (or in Walter Stroup's words: What would Fisher do?). Combining or not combining two levels is more than just a question of significance testing.

Steve Denham

View solution in original post

SteveDenham · Posted 05-23-2013 08:28 AM

There really isn't a reason to believe that the contrast in GLM and the t test will give the same answer, as the data used are not the same. GLM uses all of the groups, and bases the contrast on the mean square error (MSE), under the assumption of homogeneity of variance. The other groups contribute to your knowledge of the estimated standard error.

Adding the additional CLASS variable removes an additional source of variation. One good example would be for a variable that had an additive gender effect. Adding gender as a class variable would reduce the variability estimate, but also would remove a major difference between levels of the groups.

In the end, the model ought to reflect the design (or in Walter Stroup's words: What would Fisher do?). Combining or not combining two levels is more than just a question of significance testing.

Steve Denham

AJones · Posted 05-23-2013 12:05 PM

Thank you. I thought I was restricting to the two levels in GLM by using a WHERE statement, but in fact I had not thought to delete the 7 extra zero placeholders in the CONTRAST statement. That must have been throwing it off, because the probability does indeed now match the TTEST.

What you say about the additional CLASS variable certainly makes sense.

As far as defining/combining variable levels, I agree that this should not be done post-experiment. In this case, I'm doing a meta-analysis and looking for trends to determine whether certain conditions are significant across studies--I am neutral to whether they are combined or not, but will certainly comment on it either way in my analysis.

I appreciate your thoughtful response and explanation!

PROC TTEST versus PROC GLM - CONTRAST

Re: PROC TTEST versus PROC GLM - CONTRAST

Re: PROC TTEST versus PROC GLM - CONTRAST

Re: PROC TTEST versus PROC GLM - CONTRAST

Registration is open

SAS Training: Just a Click Away