BookmarkSubscribeRSS Feed
Mathis1
Quartz | Level 8

Hello there,

I'm looking to predict a price variable with around 8 variables (each of them being categorical and presenting multiple modalities). Before trying to analyse the two-way interaction , i'd like to see if i can find an interaction A*B that is more significant than others. So i can focus on this interaction to analyze it. Any advice ?

 

Thank you for your help !

🙂

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I don't think there is a simple Yes or No answer to your question. It depends on the data, and it also depends on what you mean by "most important interaction" (which may or may not be different than where you say "most significant interaction"). Furthermore, in the context of building a model and interpreting the results, there are a lot of things that come into play. So that's my answer — my answer is that there is no simple answer and it depends.

--
Paige Miller
SteveDenham
Jade | Level 19

And if you only want to look at two way interactions, add "@2" to the model statement that @Ksharp  gave.

For example:

PROC HPGENSELECT data=<yourdatafile>;
CLASS a b c d; /* List all of your variables of interest here */
model y=a|b|c|d@2/distribution=normal; /* Again, include all your variables of interest */

You should probably have an include option to make sure that you include main effects and that the selection process only addresses the two way interactions.

 

And now for the caveat: Just because an interaction is "more significant" does not imply that it is more important from a process standpoint.  That is a matter of knowledge of the field, so be very cautious in proceeding.

 

SteveDenham

PaigeMiller
Diamond | Level 26

I agree with @SteveDenham. It's easy to get SAS to do the calculations and present them. It takes a lot more to interpret the model results and pick "most important" or "most significant".

--
Paige Miller
StatDave
SAS Super FREQ

See this note on variable importance. One easy alternative discussed there is PROC ADAPTIVEREG. By default it fits a flexible model that is a selection of the most important effects among the possible main effects and the two-way interactions of the variables you specify in the MODEL statement. It includes a table of variable importance values it uses which are based on a GCV criterion that is explained in the procedure documentation. 

 

You also might want to be careful if you are planning to assume that your price response variable is normally distributed. Depending on how large those values are, its distribution might be pretty skewed so that a distribution like gamma or inverse Gaussian is more appropriate. Both are supported by ADAPTIVEREG as well as the normal. The RsquareV macro discussed in the note is also a possible way to assess importance. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 590 views
  • 5 likes
  • 5 in conversation