Hello,
I have weekly time series POS data from retail on store level where the dependent variable is demand for a product in volume and the independent variables are product price, and a binary intervention dummy variable to indicate promotion within certain time frame on retail store level with location and address.
What statistical analysis can determine if the promotion impacts demand/sales? Do any of you have any suggestions or recommendations? Any feedback would be very much appreciated!
Ethan
P.S. I do have SAS/ETS available.
Example: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/etsug/etsug_arima_examples04.htm
Thanks so much for your prompt response with the helpful example.
But I still don't know how to interpret and compare the results between with and without intervention based the forecasting result.
My goal is not forecasting with those historical data. I expected to build a A/B test and have no idea on how to define a control and a test group.
Could you enlighten me?
Thanks,
@t75wez1 wrote:
But I still don't know how to interpret and compare the results between with and without intervention based the forecasting result.
My goal is not forecasting with those historical data. I expected to build a A/B test and have no idea on how to define a control and a test group.
Google finds this: https://medium.com/mayflower-team/controlling-influence-between-groups-in-a-b-testing-interrupted-ti...
and other potentially useful links
Still not clear on how to implement the example and the idea from two articles in SAS code to design A/B test.
Not sure if comparing the results with X1 and without X1 by running the ARIMA model below twice.
Could anyone shed some light on it?
Thanks,
Here is the time series data and ARIMA code:
/*-------------------------------------------------------------- SAS Sample Library Name: ariex04.sas Description: Example program from SAS/ETS User's Guide, The ARIMA Procedure Title: An Intervention Model for Ozone Data Product: SAS/ETS Software Keys: time series analysis PROC: ARIMA Notes: --------------------------------------------------------------*/ title1 'Intervention Data for Ozone Concentration'; title2 '(Box and Tiao, JASA 1975 P.70)'; data air; input ozone @@; label ozone = 'Ozone Concentration' x1 = 'Intervention for post 1960 period' summer = 'Summer Months Intervention' winter = 'Winter Months Intervention'; date = intnx( 'month', '31dec1954'd, _n_ ); format date monyy.; month = month( date ); year = year( date ); x1 = year >= 1960; summer = ( 5 < month < 11 ) * ( year > 1965 ); winter = ( year > 1965 ) - summer; datalines; 2.7 2.0 3.6 5.0 6.5 6.1 5.9 5.0 6.4 7.4 8.2 3.9 4.1 4.5 5.5 3.8 4.8 5.6 6.3 5.9 8.7 5.3 5.7 5.7 3.0 3.4 4.9 4.5 4.0 5.7 6.3 7.1 8.0 5.2 5.0 4.7 3.7 3.1 2.5 4.0 4.1 4.6 4.4 4.2 5.1 4.6 4.4 4.0 2.9 2.4 4.7 5.1 4.0 7.5 7.7 6.3 5.3 5.7 4.8 2.7 1.7 2.0 3.4 4.0 4.3 5.0 5.5 5.0 5.4 3.8 2.4 2.0 2.2 2.5 2.6 3.3 2.9 4.3 4.2 4.2 3.9 3.9 2.5 2.2 2.4 1.9 2.1 4.5 3.3 3.4 4.1 5.7 4.8 5.0 2.8 2.9 1.7 3.2 2.7 3.0 3.4 3.8 5.0 4.8 4.9 3.5 2.5 2.4 1.6 2.3 2.5 3.1 3.5 4.5 5.7 5.0 4.6 4.8 2.1 1.4 2.1 2.9 2.7 4.2 3.9 4.1 4.6 5.8 4.4 6.1 3.5 1.9 1.8 1.9 3.7 4.4 3.8 5.6 5.7 5.1 5.6 4.8 2.5 1.5 1.8 2.5 2.6 1.8 3.7 3.7 4.9 5.1 3.7 5.4 3.0 1.8 2.1 2.6 2.8 3.2 3.5 3.5 4.9 4.2 4.7 3.7 3.2 1.8 2.0 1.7 2.8 3.2 4.4 3.4 3.9 5.5 3.8 3.2 2.3 2.2 1.3 2.3 2.7 3.3 3.7 3.0 3.8 4.7 4.6 2.9 1.7 1.3 1.8 2.0 2.2 3.0 2.4 3.5 3.5 3.3 2.7 2.5 1.6 1.2 1.5 2.0 3.1 3.0 3.5 3.4 4.0 3.8 3.1 2.1 1.6 1.3 . . . . . . . . . . . . ; proc arima data=air; /* Identify and seasonally difference ozone series */ identify var=ozone(12) crosscorr=( x1(12) summer winter ) noprint; /* Fit a multiple regression with a seasonal MA model */ /* by the maximum likelihood method */ estimate q=(1)(12) input=( x1 summer winter ) noconstant method=ml; /* Forecast */ forecast lead=12 id=date interval=month out=arimaout; run;
The choice of the statistical method most appropriate for your data and goal are up to you, but from what I can tell from your description it sounds as though a Generalized Estimating Equations (GEE) model might be worth considering. If you have multiple measurements of demand over time for each store, you might assume that the measurements within a store are more correlated than measurements between stores. A GEE model allows you to account for correlation within clusters or subjects (stores in your case). See the examples in the PROC GEE documentation in the SAS/STAT User's Guide. You would need to choose an appropriate distribution for your response variable (demand) to be specified in the DIST= option. If this is strictly a positively-valued variable, then the gamma or inverse gaussian distribution is often used, generally paired with the log link function (LINK=LOG option). Note that those two distributions do not allow for zero values. You will also need to choose an correlation structure for the form of the matrix of correlations among the measurements within stores. Since GEE protects against incorrect choice of the structure, simpler structures are often used. The LSMEANS statement can be used to compare the levels of intervention. GIven all of that, the code might look something like this.
proc gee;
class store interven;
model demand = interven / dist=gamma link=log;
repeated subject=store / type=exch;
lsmeans interven / diff ilink cl;
run;
Thanks so much for the suggestion.
I'd love to explore "proc gee" but those time series on sales for each store are seasonally differenced. How to bring time to the picture?
Sure. I did.
Thanks,
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.