Hello, I have a question about handling the TIME variable in my time series analysis. Hope someone can help 🙂
Below is my data:
data sales;
input Week weekday Target;
datalines;
1 4 539.577
1 5 224.675
1 6 129.412
2 2 317.12
2 3 210.517
2 4 207.364
2 5 263.043
2 6 248.958
3 2 344.291
3 3 248.428
3 4 281.42
3 5 243.568
3 6 308.178
4 2 363.402
4 3 336.872
4 4 246.992
4 5 308.88
4 6 233.126
5 2 404.38
1 3 298.56
1 4 229.249
1 5 236.304
1 6 297.174
2 2 409.401
2 3 231.035
2 4 238.826
2 5 235.598
2 6 242.112
3 2 490.79
3 3 289.657
3 4 298.459
3 5 323.603
4 3 616.453
4 4 346.035
4 5 307.645
4 6 253.847
5 2 530.944
5 3 333.359
5 4 306.356
1 6 416.83
2 2 415.187
2 3 268.002
2 4 234.503
2 5 234.724
2 6 230.064
3 2 357.394
3 3 259.246
3 4 244.235
3 5 402.607
3 6 255.061
4 2 342.606
4 3 268.64
4 4 188.601
4 5 202.022
4 6 213.509
5 2 316.849
5 3 286.412
5 4 303.447
5 5 304.95
5 6 331.9
;
run;
So basically, I have three columns. The first two are the WEEK and WEEKDAY columns. The WEEK column contains the week number and the second column contains the day of the week (e.g. 2=Monday, 3=Tuesday, 4=Wednesday...etc).
Just as an example, the first observation has WEEK=1 and WEEKDAY=4. This is the time point for the Wednesday in week 1.
I'm modeling the TARGET variable using Proc Arima and Proc ESM. I believe I need one TIME variable for the x-axis. There are in total 60 observations. I did a quick check and saw that the response is substantially higher on Monday and lower on the rest of the week.
I'm wondering how I can establish the TIME variable that reflects the weekday information. For example, if I treat the TIME simply as something from 1 to 60, I will lose the seasonal component of the data (because there are some missing days due to holiday).
I believe I have two options:
I'm fairly new to time series. Does anyone have any comment about how this situation should be handled?
Thanks!
The best way would be to create a date variable to use for the time axis. You will need rows for the Saturday and Sunday dates with missing values for the target variable.
Then you can include weekly differencing in estimate statements for fitting the model. The following will evaluate daily differences and weekly.
identify var=target; estimate p=(1 7);
In the forecasting stage, you can use the ID and INTERVAL options to specify it is daily data.
The ID and INTERVAL can be specified in the FORECAST statement, for example,
forecast id=date interval=day;
You shouldn't need a dummy variable for weekday after doing this, at least in PROC ARIMA. It should see if weekly differencing helps with the model based on the INTERVAL=DAY option.
You can also find more information here:
PROC ESM can be approached similarly, except the ID variable is specified in its own statement, for example:
proc esm data=<input-data-set> out=<output-data-set>; id <time-ID-variable> interval=<frequency>; forecast <time-series-variables>; run;
Here is another link to the PROC ESM documentation
The best way would be to create a date variable to use for the time axis. You will need rows for the Saturday and Sunday dates with missing values for the target variable.
Then you can include weekly differencing in estimate statements for fitting the model. The following will evaluate daily differences and weekly.
identify var=target; estimate p=(1 7);
In the forecasting stage, you can use the ID and INTERVAL options to specify it is daily data.
The ID and INTERVAL can be specified in the FORECAST statement, for example,
forecast id=date interval=day;
You shouldn't need a dummy variable for weekday after doing this, at least in PROC ARIMA. It should see if weekly differencing helps with the model based on the INTERVAL=DAY option.
You can also find more information here:
PROC ESM can be approached similarly, except the ID variable is specified in its own statement, for example:
proc esm data=<input-data-set> out=<output-data-set>; id <time-ID-variable> interval=<frequency>; forecast <time-series-variables>; run;
Here is another link to the PROC ESM documentation
So there are several ways to approach this, but your data set is missing necessary information for creating a date variable. You can easily convert these, but you have data that lists five weeks, and then repeats the same weeks, which leads me to the following questions about your data: Are there by variables? Do you know the year? Are these repeated observations for the same set of weeks, or are these different years or just by groups?
You could also use seasonality=5 on the proc statement, which would tell it that there is a seasonality of 5, as you would guess, but I'm not sure how you're using PROC ESM or ARIMA without a date variable?
I have drafted a little bit of code that could help you, but I am making the assumption that the year is 2017 for all of your date variables and your data is not sorted by week, thus you have repeated observations. You will notice there will be a large gap in data if this spans multiple years since there are 52 weeks in a year and you only have five.
I added this code above your "datalines" statement
the first line will get the date for the Sunday of the given week, you then add the number of days to get the actual day. There might be a more sophisticated way to do this with multipliers and shifts, but this was the easiest. I added a format to show you the dates will give you the weekday as given. More information on this is given in the ETS doc chapter "Date Intervals, Formats, and Functions"
datestart=intnx('week','01JAN2017'd,week);
actualdate=intnx('day',datestart,weekday-1);
format datestart actualdate weekdate.;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.