Hi,
Very nice problem, thank you for sharing it.
What I would do first, is to create a time-series model for each of the store sales.
The simplest time-series model would be one, that contains an explanatory variable, which is simply the count of nearby stores. This explanatory variable changes over time of course.
The parameter estimat for this variable would be the "Nearby_Store_Open" effect.
As a next model, instead of the count variable, I would use dummy variables.
For the time series of store 570 you don't need dummy variable for 156, 531, 406, 499, 385 - because they already existed, when the store was opened.
But when you analyse time series 156, you will have a EXIST_570 variable, which is 0 until 06/05/2006, and 1 after it.
To make time-series models more precise, maybe it is worth to include trend and seasonality into the model (if you have several years of data), or maybe other explanatory variables (example: store was closed for 1 day on a given week).
Sometimes events such as store_opening have a gradual effect. This means, when a nerby store opens, first it has just some minor effect on sales of the first store, the next week this effect is bigger, after some weeks this effect stabilizes. There are many way to model this. Of course first try the simlest ideas above.
If you have SAS/ETS, I would start by looking at PROC AUTOREG (simplest), PROC ARIMA, PROC UCM (most complex) - these are capable of using expanatory variables (regressors).
With PROC PANEL, PROC VARMAX, PROC SSM you could model all the store time-series in one model. That would be a very nice model, but I don't suggest you it right now.
If you don't have SAS/ETS, you can use SAS/STAT procedures: PROC REG, PROC GLM,...
They are not native time series models. Using them you ignore lot's of time-series specific phenomena, but still you can use them, and get good results.
If you still have time to play around, you could measure the proximity effect of other stores. The hypotheses would be: Even within 1 mile, the store distance matters. One idea of measuring it: instead of creating 0/1 variables, you create a varaible based on the distance. The formula creating that variable... well I have think of it. Until the store does not exist, that variable shoud be 0. When the store is there, the closer it is, the higher the value. It the store is very distant, it converges to 0.
Greg
... View more