BookmarkSubscribeRSS Feed
🔒 This topic is locked. We are no longer accepting replies to this topic. Need further help? Please sign in and ask a new question.
WWD
Obsidian | Level 7 WWD
Obsidian | Level 7

In my previous question, the discussion was about when interaction terms are not statistically significant.  In particular, winter-summer and spring-summer were the only seasons that interacted with sale price given heating units.  There were 4 other season pairs that were seen to be not statistically significant.

 

If the model that was developed were to be put in production to make predictions, is it statistically defensible to create an indicator variable to indicate the only pairings that should be included?

 

For example,

 

There are six different season pairs: (winter, spring), (winter, summer), (winter, fall), (spring, summer), (spring, fall), (summer, fall).  If one of these pairs produced a difference, for example (winter, spring), would you create an indicator variable for a winter-spring difference?  If there were two combos that produced a difference, would you then create two indicator variables—one for each interaction?

 

This is totally cryptic.  Please don’t hesitate to e-mail for extra discription.

 

Bill Donaldson 

 

 

3 REPLIES 3
Cynthia_sas
SAS Super FREQ
Hi:
We've asked the instructors to comment on this. Stay tuned!
Cynthia
MarcHuber
SAS Employee

Hi, Bill,

 

That is an interesting thought.  Unfortunately, it would not really be possible to even implement even if it were advisable.  

 

Think about how you might try to code a "winter-summer" variable, as your questions mentions.  You might try to add a new column to your "design matrix" (the matrix of data that is actually fed to SAS for your model parameter estimations).  How would you code it?  If you assign the value '1' to winter and '0' to summer (or any two values in the universe that you'd like to use) then how would you code fall and spring?  If you coded them missing then SAS would eliminate those observations from its calculations.  So, essentially you'd have an analysis of data (in my best movie trailer voice), "in a world where there is only summer and winter...".  That's not what you want.  

 

There are also issues of taking advantage of chance relationships in the sample you are using that are only partially addressed with multiple comparisons adjustments, such as Tukey's HSD or the Bonferroni adjustment.  You have to be careful about overfitting your model to the sample used to estimate parameters. This leads to "false discovery" which is what it sounds like - discovering connections that don't really exist.

 

It's worthwhile to say that the concept of multiple comparison adjustment isn't without controversy.  A basic issue is what should constitute a family of tests that I should adjust for.  For instance, should I adjust for all hypotheses I've tested in my whole statistical life because I know that Type 1 Error rate assures (under normal regression assumptions) that alpha percent of my tests of true null hypotheses will appear significant, just by chance?

 

I guess the easy answer is just the practical problem of creating the variable that you're thinking about.  But there are deeper issues, as well.

 

Marc

WWD
Obsidian | Level 7 WWD
Obsidian | Level 7

Marc:

 

I did a little poking around on the internet and found a discussion about what happens when one group within a category variable is not statistically different from the reference group.  In the discussion, the author said that a category variable is “all or nothing”.  By that the author was saying that it is not statistically defensible to remove one dummy variable, which would mean collapsing that category into the reference group.  If you refit the model, using the “new” category variable with 1 fewer groups, you’d probably be OK, I think.  (In the advanced modeling module, there are techniques for collapsing categorical variable.)

 

I don’t know enough at this point in time to even say if what I found is equivalent to what you wrote.  I have some thinking to do.

 

Thank you for you for looking into my question,

 

Bill

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 737 views
  • 0 likes
  • 3 in conversation