01-31-2013 08:39 PM
The monthly sales amount of a retail store(S) is positively correlated with the number of visiting customers(C). The regression model of S on C has a positive regression coefficient for C. But after I add some additional variables, the reg coefficient for C continues to be significant but becomes negative. Should I include C in the model? If so, the interpretation of C effect will not make sense at all. Any ideas?
02-01-2013 09:43 AM
Switching of signs of the regression coefficient is a known drawback to ordinary least squares regression when you have correlated predictor variables.
What should you do about this? There are several choices ... remove the correlated predictor variables; or use the final model for prediction only, not for interpretation; or change your estimation to Partial Least Squares (PROC PLS in SAS) which is less susceptible to the switching signs problem and produces regression coefficients that have smaller mean squared error than ordinary least squares does with correlated predictor variables.
02-02-2013 02:39 PM
Instead of total monthly sales (S) as your dependent variable, you could model average monthly sales per customer (S/C).
You could also examine the independent variables added to the regression on S of C that change the sign of the regression coefficient of C. What combination of these independent variables change this sign and why? Perhaps such an examination would identify subgroups based on levels of these independent variables or interactions between C and these independent variables that might be worth further study and modelling.