Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- PROC UCM Blockseason statement - I cannot get it to detect multiple se...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-27-2018 10:47 AM
(1157 views)

I created an hourly data set with both time of day and day of week seasonality to test the utilization of season with blockseason. The data should have a constant mean and no trend. It contains a year's worth of data. A sample week is shown in the chart and the time of day and day of week seasonality is easily observed. The data is in the attached csv.

I used the call center example from SAS ETS documentation as a guide of how to specify my model (http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_ucm_sect042.ht...).

The resulting forecast has variation by hour, but not by day of week; the significance table shows that the block seasonality is not modeled.

This is my code. Any ideas on why I do not get the expected result?

```
ods graphics on;
proc ucm data=dat3;
id id_var interval=dthour1;
model count2;
irregular plot=smooth;
level plot=smooth noest variance=0;
season length=24 plot=smooth type=trig;
blockseason nblocks=7 blocksize=24 plot=smooth type=trig;
forecast back=356 lead=356 print=none outfor=outfor;
nloptions tech=dbldog;
run;
```

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I ran your program. The estimates of the disturbance variances associated with the season and block-season components, 0.00380 and 7.88579, respectively, are significant. This implies that these components are time-varying. The components significance table you are looking at is based on the state at the final time point of your data set (as is stated in the title of that table). For time-varying components the information in this table is not very relevant because it is only about that last snapshot of your data. That is, it does not imply that the block-season component as a whole is insignificant. When I look at the plot of the block-season component, its amplitude varies over time. In many sections of the data it is quite high, around 50 (whereas your data range is 0 to 121). So the block-season component is clearly explaining good portion of the data variation in certain sections of your data.

Hope this helps.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I did notice that it was significant in the other table. My ultimate concern is not these metrics, but that the forecast was not changing by day of the week.

I just tried specifying blockseason as deterministic and the results are much better.

My heuristic for the specification of the time series components is usually:

Evaluate final state: use a cutoff to decide whether or not to include in model. If yes, then

Evaluate final estimates of free parameters table: use a cutoff to determine whether or not to leave as stochastic or change to deterministic.

The process in this case was to do the reverse. When I surveyed the literature I could find on proc ucm usage, what I described above was how others used it most of the time. Is this heuristic not always best? Should you look at both tables simultaneously to determine the appropriate specification?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*Time Series Modelling with Unobserved Components*. Boca Raton, FL: CRC Press.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I was able to get better results by reviewing, and then dropping, some harmonics. I hadn't thought to do this the first time around as I've normally used type=dummy. In particular, I dropped 6, 8 and 10 even though the p values were not particularly high (but were higher).

This would complicate matters if I introduced this into our current model process, as we automatically specify the model. I would need to build logic that would figure out which harmonics to drop/keep according to certain rules (some p value cutoff, allowing it to iterate several times to check the results and remove again); and have those rules consistently result in modeling the seasonality the way it should be modeled (this is just one example, will I even have this problem with production data? If so, will this be the solution?)

I'm curious now as to whether or not this would be common, to be able to drop a few harmonics on the season and have it make a dramatic difference in the blockseason component.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Modeling hourly (or even more frequent) electricity load is a well-studied problem and many approaches are suggested. Many of these approaches can be handled using the UCM procedure. At this time I am using this forum for illustrating one such approach discussed in Case Study 3 in Chapter 9 of "Time Series Modelling with Unobserved Components," by

Pelagatti, M. M. (2015). Boca Raton, FL: CRC Press. The electricity load data set used in this example is taken from

http://www.ucmbook.info/ , which the website associated with the book.

I am also including a link to one of my workshops on UCM and SSM procedures. This is a link to the slides of a workshop at

"The 36th International Symposium on Forecasting Santander, Spain | 19-22 June 2016":

https://forecasters.org/wp-content/uploads/gravity_forms/7-621289a708af3e7af65a7cd487aee6eb/2016/07/...

See the example "Hourly Electricity Load in Italy" in the Illustrations section for additional explanations.

Note that each PROC UCM call can take up to 15 minutes to finish (processes 9 years of hourly data).

/* ------------------Program-----------------------------------------------*/

libname wk 'C:\udrive\ucmBook\AX2016\programs'; * directory location of the elettroload data set;

data test(drop=h1-h24);

set wk.elettroload;

array hr{24} h1-h24;

do hour=1 to 24;

eload = hr[hour];

dthour = intnx('dthour', dhms(date, 0, 0, 0), hour-1);

output;

end;

format dthour datetime.;

run;

proc sort data=test;

by hour date;

run;

data test(drop=j twoPi yf days year);

set test;

array cosf{16} cos1-cos16;

array sinf{16} sin1-sin16;

year = year(date);

yf = mdy(1, 1, year);

days = date - yf; *day within that year;

twoPi = 2*constant('pi')/365.25;

do j=1 to 16;

cosf[j] = cos(twoPi*j*days);

sinf[j] = sin(twoPi*j*days);

end;

run;

/* Analysis as per Case Study 3 in Chapt 9 of this book. */

proc ucm data=test;

*where hour=1;

id date interval=day;

by hour;

irregular;

level;

cycle period=7 rho=1 noest=(period rho);

cycle period=3.5 rho=1 noest=(period rho);

cycle period=2.3333 rho=1 noest=(period rho);

randomreg cos1-cos16 sin1-sin16;

model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat

easterSun easterMon easterTue holidays holySat

holySun bridgeDay endYear;

estimate back=14 plot=panel;

forecast back=14 lead=14 outfor=loadfor;

run;

*An alternate formulation that uses a trigonometric season component with

first 16 harmonics to model the day-of-the-year pattern;

title "Model that uses parsimonious trigonometric season";

proc ucm data=test;

*where hour=1;

id date interval=day;

by hour;

irregular;

level;

cycle period=7 rho=1 noest=(period rho);

cycle period=3.5 rho=1 noest=(period rho);

cycle period=2.3333 rho=1 noest=(period rho);

season length=365 type=trig keeph=1 to 16 by 1;

model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat

easterSun easterMon easterTue holidays holySat

holySun bridgeDay endYear;

estimate back=14 plot=panel;

forecast back=14 lead=14 outfor=loadfor1;

run;

proc sgplot data=loadfor1;

title "Day of the year pattern for hour=1 in 2006";

where year(date) = 2006 && hour=1;

series x=date y=s_season;

run;

title;

*Yet another formulation that uses spline season to model

the day-of-the-year pattern;

title "Model that uses spline season";

proc ucm data=test;

*where hour=1;

id date interval=day;

by hour;

irregular;

level;

cycle period=7 rho=1 noest=(period rho);

cycle period=3.5 rho=1 noest=(period rho);

cycle period=2.3333 rho=1 noest=(period rho);

splineseason length=365 degree=3 knots=4 15 30 45 60 75 90 105 120 135 150 165

180 195 210 225 240 255 270 285 300 315

330 345 360;

model eload = dec24 dec25 dec26 jan1 jan6 aug15 easterSat

easterSun easterMon easterTue holidays holySat

holySun bridgeDay endYear;

estimate back=14 plot=panel;

forecast back=14 lead=14 outfor=ucmfor2;

run;

proc sgplot data=ucmfor2;

title "Day-of-the-year pattern for hour=1 in 2006";

where year(date) = 2006 && hour=1;

series x=date y=s_splseas;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Some notes about the program:

Since I could not attach the SAS data set elettroload, I have attached a csv file. You will need to read it into a SAS data set and put it in the directory pointed by the libname statement at the start. Before using the data in PROC UCM, a few more steps need to be done:

Data prepration for the analysis with PROC UCM.

Step 1: Read the hourly load readings from the data set "elettroload" and store them in

a new data set, "Test". Test has three columns of interest: Date, Hour, eload (hourly electricity load for that date).

The dthour column is not used the in analysis.

Step 2: Sort Test by Hour and Date so that Hour can be used as a BY group. In effect, we deal with 24 daily series.

Step 3: Add some sinsoidal regressors to the data set. See the explanation in the book.

Since I could not attach the SAS data set elettroload, I have attached a csv file. You will need to read it into a SAS data set and put it in the directory pointed by the libname statement at the start. Before using the data in PROC UCM, a few more steps need to be done:

Data prepration for the analysis with PROC UCM.

Step 1: Read the hourly load readings from the data set "elettroload" and store them in

a new data set, "Test". Test has three columns of interest: Date, Hour, eload (hourly electricity load for that date).

The dthour column is not used the in analysis.

Step 2: Sort Test by Hour and Date so that Hour can be used as a BY group. In effect, we deal with 24 daily series.

Step 3: Add some sinsoidal regressors to the data set. See the explanation in the book.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.