BookmarkSubscribeRSS Feed
kurofufu
Calcite | Level 5

The monthly sales amount of a retail store(S) is positively correlated with the number of visiting customers(C). The regression model of S on C has a positive regression coefficient for C. But after I add some additional variables, the reg coefficient for C continues to be significant but becomes negative. Should I include C in the model? If so, the interpretation of C effect will not make sense at all. Any ideas?

2 REPLIES 2
PaigeMiller
Diamond | Level 26

Switching of signs of the regression coefficient is a known drawback to ordinary least squares regression when you have correlated predictor variables.

What should you do about this? There are several choices ... remove the correlated predictor variables; or use the final model for prediction only, not for interpretation; or change your estimation to Partial Least Squares (PROC PLS in SAS) which is less susceptible to the switching signs problem and produces regression coefficients that have smaller mean squared error than ordinary least squares does with correlated predictor variables.

--
Paige Miller
1zmm
Quartz | Level 8

Instead of total monthly sales (S) as your dependent variable, you could model average monthly sales per customer (S/C). 

You could also examine the independent variables added to the regression on S of C that change the sign of the regression coefficient of C.  What combination of these independent variables change this sign and why?  Perhaps such an examination would identify subgroups based on levels of these independent variables or interactions between C and these independent variables that might be worth further study and modelling.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1284 views
  • 0 likes
  • 3 in conversation