If Y is regressed on X1 only, R-square is 0.72 and the X1's coefficient is significant.
If Y is regressed on X2 only, R-square is 0.002 and the X2's cofficient is NOT significant.
If Y is regressed on X1 and X2, R-square is 0.76 and the coefficients of both X1 and X2 are significant.
Questions:
1. Why X2 becomes significant when it is combined with X1? Is there a more intuitive explanation for this?
2. Should I use X2 in the final model?
1) One scenario is that Y is highly correlated with X1, and that X2 and Y are nearly orthogonal. That would mean that X1 explains Y very well, but X2 does not. However, after you fit Y to X1, it might be that the RESIDUALS are predicted by X2!
2) I'll let others discuss whether you should include X2. You should probably look at the adjusted R-square to see if there is incremental value in choosing the more complicated model.
What happens when you add the term x1*x2 (interaction term).
The adjusted R-square is the same - R-square = 0.7598, Adj R-square = 0.7581
The interaction term X1*X2 is NOT significant.
The fitted model using X1 and X2 is
Y = 0.5 + 1.2*X1 + 1.1*X2 + e
This equation tells me that if I increase one unit in X2 while X1 remains constant, Y will increase 1.1. Is this a wrong conclusion given that when Y is regressed on X1 only, X1 is not significant?
It could be that not all regressions are performed on the same data. What is the pattern of missing values in X1 and X2, i.e. what are the Ns of the three regressions? - PG
I just checked. Three regression have the same number of Ns and no missing values in X1 and X2.
Read about the phenomenon called "suppression". One article that describes it is the following:
Lynn HS. Suppression and confounding in action. The American Statistician 2003 Feb;57(1):58-61.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.