BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AJones
Calcite | Level 5

I'm running tests on a dataset where my CLASS variable has 9 levels.  Two of the levels are very similar, and I want to determine whether they are significantly different from each other to see if they can actually be separated or whether they need to be combined.

When I run PROC TTEST restricted to these two levels, they are shown to be significantly different from each other (Pr > |t|  is  <0.0001) and (Pr > F is <0.0001).

When I run a PROC GLM modeling the same 'var' variable for these two levels, however, the CONTRAST statement returns a significant yet different result (Pr > F = 0.0407). 

Should I expect TTEST and CONTRAST to find the same significance?  Is it recommended to rely on one or the other when testing for a significant difference?

Also, if I run the full model (with an additional CLASS variable), the Pr > F in the CONTRAST output for the two levels increases to 0.3231.  Would you think that the levels need to be significantly different from each other in this full model, or only in the more basic model?  I realize that perhaps this decision may be left to the discretion of the modeler.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

There really isn't a reason to believe that the contrast in GLM and the t test will give the same answer, as the data used are not the same.  GLM uses all of the groups, and bases the contrast on the mean square error (MSE), under the assumption of homogeneity of variance.  The other groups contribute to your knowledge of the estimated standard error.

Adding the additional CLASS variable removes an additional source of variation.  One good example would be for a variable that had an additive gender effect.  Adding gender as a class variable would reduce the variability estimate, but also would remove a major difference between levels of the groups.

In the end, the model ought to reflect the design (or in Walter Stroup's words: What would Fisher do?).  Combining or not combining two levels is more than just a question of significance testing.

Steve Denham

View solution in original post

2 REPLIES 2
SteveDenham
Jade | Level 19

There really isn't a reason to believe that the contrast in GLM and the t test will give the same answer, as the data used are not the same.  GLM uses all of the groups, and bases the contrast on the mean square error (MSE), under the assumption of homogeneity of variance.  The other groups contribute to your knowledge of the estimated standard error.

Adding the additional CLASS variable removes an additional source of variation.  One good example would be for a variable that had an additive gender effect.  Adding gender as a class variable would reduce the variability estimate, but also would remove a major difference between levels of the groups.

In the end, the model ought to reflect the design (or in Walter Stroup's words: What would Fisher do?).  Combining or not combining two levels is more than just a question of significance testing.

Steve Denham

AJones
Calcite | Level 5

Thank you.  I thought I was restricting to the two levels in GLM by using a WHERE statement, but in fact I had not thought to delete the 7 extra zero placeholders in the CONTRAST statement.  That must have been throwing it off, because the probability does indeed now match the TTEST. 

What you say about the additional CLASS variable certainly makes sense.

As far as defining/combining variable levels, I agree that this should not be done post-experiment.  In this case, I'm doing a meta-analysis and looking for trends to determine whether certain conditions are significant across studies--I am neutral to whether they are combined or not, but will certainly comment on it either way in my analysis.

I appreciate your thoughtful response and explanation!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 880 views
  • 0 likes
  • 2 in conversation