BookmarkSubscribeRSS Feed
cau83
Pyrite | Level 9

I created an hourly data set with both time of day and day of week seasonality to test the utilization of season with blockseason. The data should have a constant mean and no trend. It contains a year's worth of data. A sample week is shown in the chart and the time of day and day of week seasonality is easily observed. The data is in the attached csv.

7days.PNG

 

I used the call center example from SAS ETS documentation as a guide of how to specify my model (http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_ucm_sect042.ht...).

 

The resulting forecast has variation by hour, but not by day of week; the significance table shows that the block seasonality is not modeled. 

fc_chart.PNG 

sig.PNG

 

This is my code. Any ideas on why I do not get the expected result?

ods graphics on;
proc ucm data=dat3;
	id id_var interval=dthour1;
	model count2;
	irregular plot=smooth;
	level plot=smooth noest variance=0;
	season length=24 plot=smooth type=trig;
	blockseason nblocks=7 blocksize=24 plot=smooth type=trig;
	forecast back=356 lead=356 print=none outfor=outfor;
	nloptions tech=dbldog;
run;
		
6 REPLIES 6
rselukar
SAS Employee

I ran your program.  The estimates of the disturbance variances associated with the season and block-season components, 0.00380 and 7.88579, respectively, are significant.  This implies that these components are time-varying.  The components significance table you are looking at is based on the state at the final time point of your data set (as is stated in the title of that table).  For time-varying components the information in this table is not very relevant because it is only about that last snapshot of your data.  That is, it does not imply that the block-season component as a whole is insignificant. When I look at the plot of the block-season component, its amplitude varies over time.  In many sections of the data it is quite high, around 50 (whereas your data range is  0 to 121).  So the block-season component is clearly explaining good portion of the data variation in certain sections of your data.

 

Hope this helps.

cau83
Pyrite | Level 9

I did notice that it was significant in the other table. My ultimate concern is not these metrics, but that the forecast was not changing by day of the week. 

 

I just tried specifying blockseason as deterministic and the results are much better.

 

My heuristic for the specification of the time series components is usually:

Evaluate final state: use a cutoff to decide whether or not to include in model. If yes, then

Evaluate final estimates of free parameters table: use a cutoff to determine whether or not to leave as stochastic or change to deterministic.

 

The process in this case was to do the reverse. When I surveyed the literature I could find on proc ucm usage, what I described above was how others used it most of the time. Is this heuristic not always best? Should you look at both tables simultaneously to determine the appropriate specification?

rselukar
SAS Employee

Generally speaking one looks at the significance of the final state of a component if it appears to be time-invariant (decided by looking at the estimate of the disturbance variance and the plot of the component).  Anyway, I will give a better response towards the end of next week.  At the moment I am quite busy because I am traveling tomorrow.  By the way, for an additional example of modeling hourly data using UCM models you can also check out Case Study 3 in Chapt 9 of Pelagatti, M. M. (2015). Time Series Modelling with Unobserved Components. Boca Raton, FL: CRC Press.   

cau83
Pyrite | Level 9

I was able to get better results by reviewing, and then dropping, some harmonics. I hadn't thought to do this the first time around as I've normally used type=dummy. In particular, I dropped 6, 8 and 10 even though the p values were not particularly high (but were higher).

 

This would complicate matters if I introduced this into our current model process, as we automatically specify the model. I would need to build logic that would figure out which harmonics to drop/keep according to certain rules (some p value cutoff, allowing it to iterate several times to check the results and remove again); and have those rules consistently result in modeling the seasonality the way it should be modeled (this is just one example, will I even have this problem with production data? If so, will this be the solution?)

 

I'm curious now as to whether or not this would be common, to be able to drop a few harmonics on the season and have it make a dramatic difference in the blockseason component.

rselukar
SAS Employee

Modeling hourly (or even more frequent) electricity load is a well-studied problem and many approaches are suggested.  Many of these approaches can be handled using the UCM procedure.  At this time I am using this forum for illustrating one such approach discussed in Case Study 3 in Chapter 9 of "Time Series Modelling with Unobserved Components," by
Pelagatti, M. M. (2015). Boca Raton, FL: CRC Press.  The electricity load data set used in this example is taken from
http://www.ucmbook.info/ , which the website associated with the book.

 

I am also including a link to one of my workshops on UCM and SSM procedures.  This is a link to the slides of a workshop at
"The 36th International Symposium on Forecasting Santander, Spain | 19-22 June 2016":
https://forecasters.org/wp-content/uploads/gravity_forms/7-621289a708af3e7af65a7cd487aee6eb/2016/07/...
See the example "Hourly Electricity Load in Italy" in the Illustrations section for additional explanations.

 

Note that each PROC UCM call can take up to 15 minutes to finish (processes 9 years of hourly data).

 

/* ------------------Program-----------------------------------------------*/

libname wk 'C:\udrive\ucmBook\AX2016\programs'; * directory location of the elettroload data set;
data test(drop=h1-h24);
   set wk.elettroload;
   array hr{24} h1-h24;
   do hour=1 to 24;
    eload = hr[hour];
    dthour = intnx('dthour', dhms(date, 0, 0, 0), hour-1);
    output;
   end;
   format dthour datetime.;
run;

proc sort data=test;
   by hour date;
run;

data test(drop=j twoPi yf days year);
  set test;
  array cosf{16} cos1-cos16;
  array sinf{16} sin1-sin16;
  year = year(date);
  yf = mdy(1, 1, year);
  days = date - yf; *day within that year;
  twoPi = 2*constant('pi')/365.25;
  do j=1 to 16;
     cosf[j] = cos(twoPi*j*days);
     sinf[j] = sin(twoPi*j*days);
  end;
run;

/* Analysis as per Case Study 3 in Chapt 9 of this book. */
proc ucm data=test;
   *where hour=1;
   id date interval=day;
   by hour;
   irregular;
   level;
   cycle period=7 rho=1 noest=(period rho);
   cycle period=3.5 rho=1 noest=(period rho);
   cycle period=2.3333 rho=1 noest=(period rho);
   randomreg cos1-cos16 sin1-sin16;
   model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat
                 easterSun easterMon easterTue holidays holySat
                 holySun bridgeDay endYear;
   estimate back=14 plot=panel;
   forecast back=14 lead=14 outfor=loadfor;
run;

*An alternate formulation that uses a trigonometric season component with
 first 16 harmonics to model the day-of-the-year pattern;
title "Model that uses parsimonious trigonometric season";
proc ucm data=test;
   *where hour=1;
   id date interval=day;
   by hour;
   irregular;
   level;
   cycle period=7 rho=1 noest=(period rho);
   cycle period=3.5 rho=1 noest=(period rho);
   cycle period=2.3333 rho=1 noest=(period rho);
   season length=365 type=trig keeph=1 to 16 by 1;
   model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat
                 easterSun easterMon easterTue holidays holySat
                 holySun bridgeDay endYear;
   estimate back=14 plot=panel;            
   forecast back=14 lead=14 outfor=loadfor1;
run;

proc sgplot data=loadfor1;
   title "Day of the year pattern for hour=1 in 2006";
   where year(date) = 2006 && hour=1;
   series x=date y=s_season;
run;
title;
*Yet another formulation that uses spline season to model
the day-of-the-year pattern;
title "Model that uses spline season";
proc ucm data=test;
   *where hour=1;
   id date interval=day;
   by hour;
   irregular;
   level;
   cycle period=7 rho=1 noest=(period rho);
   cycle period=3.5 rho=1 noest=(period rho);
   cycle period=2.3333 rho=1 noest=(period rho);
   splineseason length=365 degree=3 knots=4 15 30 45 60 75 90 105 120 135 150 165
                                          180 195 210 225 240 255 270 285 300 315
                                          330 345 360;
   model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat
                 easterSun easterMon easterTue holidays holySat
                 holySun bridgeDay endYear;
   estimate back=14 plot=panel;            
   forecast back=14 lead=14 outfor=ucmfor2;
run;

proc sgplot data=ucmfor2;
   title "Day-of-the-year pattern for hour=1 in 2006";
   where year(date) = 2006 && hour=1;
   series x=date y=s_splseas;
run;

rselukar
SAS Employee
Some notes about the program:
Since I could not attach the SAS data set elettroload, I have attached a csv file. You will need to read it into a SAS data set and put it in the directory pointed by the libname statement at the start. Before using the data in PROC UCM, a few more steps need to be done:
Data prepration for the analysis with PROC UCM.

Step 1: Read the hourly load readings from the data set "elettroload" and store them in
a new data set, "Test". Test has three columns of interest: Date, Hour, eload (hourly electricity load for that date).
The dthour column is not used the in analysis.

Step 2: Sort Test by Hour and Date so that Hour can be used as a BY group. In effect, we deal with 24 daily series.
Step 3: Add some sinsoidal regressors to the data set. See the explanation in the book.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1106 views
  • 0 likes
  • 2 in conversation