Dear all,
I have daily data on how many people entered a certain shopping center, and the weather on that day (temperature). I wish to find out if there is a relation between the weather and the number of people who entered the shopping center.
In addition, I have covariates such as the average income in that region.
The problem is, the covariates, such as mean income, are monthly, not daily. So for my main dependent and independent variables, I have a daily time series, while the covariates are monthly.
How should I handle this situation ?
I thought of several options, not sure which is best:
1. Aggregate the daily variables using means, to make them monthly - I will lose information
2. Make the monthly data daily, i.e., for each day in this month, the income will be the same. This will lead to a model with random effect, won't it ?
How would you handle this problem and which model would you use ? (regression, time series, mixed model)
Thank you in advance !
(Using SAS 9.4)
From your description I feel that a time series model could be a reasonable choice. Time series models will permit the capturing of time varying level, day of the week seasonality, and regression effects like temperature and the monthly income. The issue of monthly income being constant during the days of a month is not particularly troublesome, as long as it is an informative predictor for the overall series. You could use procedures such as ARIMA, AUTOREG, or UCM in SAS/ETS for such analysis. Just to get you started, I am going to provide a sample program for UCM. Assume that your daily data are stored in a data set "shopping" and has the following columns: date, NPeople, temp, and income.
proc ucm data=shopping;
id date interval=day;
model NPeople = temp income;
/* specifies a smooth trend component */
level variance=0 noest plot=smooth;
slope;
/* specifies a day of the week component */
season length=7 type=trig plot=smooth;
/* noise component */
irregular;
/* residual diagnostics */
estimate plot=panel;
forecast plot=(forecasts decomp);
run;
In your example, the temp effect could be nonlinear. You could capture that by using the SPLINEREG statement (see "Example 42.6 Using Splines to Incorporate Nonlinear Effects" in the UCM doc:
Hope this helps.
Do you find monthly variations in the regional income? To me it could very well be treated as a constant and hence you may use the temperature alone. If there is wide monthly variations, you may group income into 4 or 5 groups which may yield to Analysis of Covariance.
Cheers,
DATASP
I see what you mean, but even if I group it, the same problem remains, which is that for each day within the month, the income will be the same, a constant within a month. So the temperature and number of people vary by day, while income by month.
I was thinking like:
Suppose you have made 3 groups. Then you will have a linear regression for temperature with the number of people for each group.
Compare the slopes and intercepts using Analysis of Covariance.
From your description I feel that a time series model could be a reasonable choice. Time series models will permit the capturing of time varying level, day of the week seasonality, and regression effects like temperature and the monthly income. The issue of monthly income being constant during the days of a month is not particularly troublesome, as long as it is an informative predictor for the overall series. You could use procedures such as ARIMA, AUTOREG, or UCM in SAS/ETS for such analysis. Just to get you started, I am going to provide a sample program for UCM. Assume that your daily data are stored in a data set "shopping" and has the following columns: date, NPeople, temp, and income.
proc ucm data=shopping;
id date interval=day;
model NPeople = temp income;
/* specifies a smooth trend component */
level variance=0 noest plot=smooth;
slope;
/* specifies a day of the week component */
season length=7 type=trig plot=smooth;
/* noise component */
irregular;
/* residual diagnostics */
estimate plot=panel;
forecast plot=(forecasts decomp);
run;
In your example, the temp effect could be nonlinear. You could capture that by using the SPLINEREG statement (see "Example 42.6 Using Splines to Incorporate Nonlinear Effects" in the UCM doc:
Hope this helps.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.