Don't mean to bump back to the top but a couple of thoughts:
As part of EDA you might want to look at these groupings. This could also be useful if you are hand-coding a decision tree to impute missing values (using mean in this instance). Note this only looks at the mean value of Carat across different cuts. One way could be:
* Carat by Cut
proc means data = dataset nmiss mean;
class CUT;
var CARAT;
run;
I echo what others say on interaction variables. Spend some time thinking about design - when I read dummy variables, I take that as levels of the categorical variable Cut (e.g. Cut_1, Cut_2, ... Cut_n) populated with 0's and 1's. But that doesn't quite get you to your interaction. If you then multiply the Cut_n by the Carat, and the results are stored in a new single variable, then there is no weighting for cuts better or worse as they're all being multipled by 1. Also, If multicollinearity is an issue, it will show up in your VIFs. Remember to leave at least one degree of freedom in the categorical variables... For instance, if there are five levels of cuts, your model should use no more than four (SAS will give a warning in the log and output from PROC REG). You could also explore binning of cuts.
Lastly, how you consider p-values and measures of GOF really, really, really depends on the purpose of your model.
A model built for statistical inference focuses on hypothesis testing of the sample. Here you want the "best" (remember accuracy is typically defined by your customer) model on the sample that fits within these parameters. You might give significant weight to p-values over other metrics.
A model built for predictive accuracy should predict well both in-sample (data you have) and out-of-sample (data you do not have). Personally, when building models for predictive accuracy, I care less about p-values and more about building and deploying a highly accurate model (this is also why I care less about if the intercept "makes sense"). Overfitting your model is a concern since it will be deployed out-of-sample. Cross-validation is extremely useful. For instance, take the in-sample data and do a random uniform split of it 70% and 30% (use a seed value for repeatability). Then train (build) your model on the 70% and test (deploy) your model on the 30%. Now judge the accuracy - if the metrics for accuracy are similar, then your model should not be overfit (I say should because sampling error might be present, but if your model has sufficient power you should be good). If you use MSE to evaluate the model, the MSE value from PROC REG in the ANOVA table is different than the MSE you'd use in this instance. And on that note, SAS uses the Sawa criterion in PROC REG for BIC not the Schwartz criterion (different formulas).
Michael
... View more