BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FaridNasrAlDeen
Fluorite | Level 6

Hello everyone!

I am desperate to get a piece of advice after a lot of browsing around in SAS user guide and the Community. I use a SAS Enterprise guide 7.1 (64-bit)

MY time series has the following charachteristics:

1) daily data;

2) irregular (no weekends or holidays included) - so I have a 5 day week, but there are weeks which have 6 days, or holiday weeks that are shorter;

3) data display intraweek, monthly, quarterly and annual seasonality;

4) data reflect payments made by individuals. 

I came across Lex Jansen paper whith a very clear overview of UCM. One of the examples is forecasting dow jones index with proc ucm. What I noticed is that the code implies a 5-day week. PLease see the code:

 

proc ucm data=dow plot=all;
 id date interval=weekday;
 model close;
 level;
 slope;
 season type=dummy length=5; 

Now the reason I got interested is that when I make my data regular by applying proc expand and make it a 7 day week I seem to lose crucial info on consumers' behavioral patterns. 

Plus my forecast yields very high standard error - 0,5% of the actual value (which is a lot in this case). What is more my forecast tends to have much longer harmonics, suggesting that the interpolation that proc expand offers makes the data inconsistent with its original pattern. 

Here is my code:

%let days_to_predict = 5;
%let dir=%sysfunc(C:/home/Far/);
ods graphics on;
goptions device= ACTXIMG;
ods pdf file="&dir.Far.pdf";
PROC IMPORT  
DATAFILE= "&dir.UCM.xlsx" DBMS=XLSX OUT= ttt REPLACE;
GETNAMES=YES;
RUN;
PROC EXPAND DATA=ttt OUT=ttt FROM = DAY
	ALIGN = BEGINNING
	METHOD = SPLINE(NOTAKNOT, NOTAKNOT) 
	PLOT=(ALL SERIES)
	OBSERVED = (BEGINNING, BEGINNING);
id date;
CONVERT dough /;
RUN;
DATA ttt;
set ttt;
    LENGTH
        date               8
        dough           8 ;
    KEEP
        date
        dough ;
    FORMAT
        date             DATE9.
        dough         F12.4 ;
    INFORMAT
        date             DATE9.
        dough         BEST12. ;
RUN;
PROC SORT
	DATA=ttt(KEEP=date dough)
	OUT=ttt;
	BY date;
RUN;	
data ttt;
set ttt;
wd = weekday(date);
dy = day(date);
b_ny = exp(-(MDY(12,31,Year(date))-date)**2/40);
a_ny = exp(-(date-MDY(12,31,Year(date)-1))**2/40);
**qt = qtr(date);
**may = exp(-(MDY(5,5,Year(date))-date)**2/20);
run;
proc ucm data=ttt;
   id date interval=day;
   model dough=wd b_ny a_ny; **may qt;
   outlier maxnum=30;
   level plot=smooth;
   slope plot=smooth;
   season length=365 type=trig keeph=2 to 12 by 1 print=harmonics plot=(FILTER SMOOTH);
   cycle period=7 noest=(period);
   irregular p=3 q=3;
   estimate back=0 plot=panel;
   forecast skipfirst=3000 back=0 lead=&days_to_predict plot=decomp;
run;
ods graphics off;
ods pdf close;

I have therefore 3 crucial questions:

1) Is there a way to work with irregular data using proc UCM?

2) How can I improve my forecast if I get longer harmonics with higher amplitude?

3) Is there a code that instead of the usual interpolation of PROC UCM would allow me to expand my dates to weekends and then copy the observation available on the previos working day. Say, I have irregular daily data (normal 5-day week with some exceptions for holiday season). I want to expand my dates to make it a 7 day week seasonality. Then for sundays and saturdays I would like to have the same value as it was observed on friday. 

 

Again the priority is to learn how to deal with irregular time series. But if that is impossible I would take any advice that would help me build a more precise forecast. 

1 ACCEPTED SOLUTION

Accepted Solutions
rselukar
SAS Employee

First a few comments about your UCM code:

1.  The length= in the season statement must be an integer.

2. Usually it is a good idea to include a simple noise component (IRREGULAR) in the model.

 

A good book for UCMs: Pelagatti, M. M. (2015). Time Series Modelling with Unobserved Components. Boca Raton, FL: CRC Press.

 

It is not easy for me to check your data pattern.  Try to see if your data can be put in some "weekday" interval pattern (see the section https://go.documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_intervals_toc.htm&docsetVersion=... ) supported by SAS.  If your holidays appear within these intervals, you will need to insert them in your data (with missing value for your response, close).  After this your series will be reasonably regular.  At least initially, don't specify periods in your cycles (let the procedure estimate the period).  Similarly, include the SEASON statement only if you have at least four complete seasons (why are you skipping the first harmonic?).

View solution in original post

5 REPLIES 5
rselukar
SAS Employee

Dealing with data that may have complex seasonal or cyclical patterns can be difficult.  If you are able to create a time series of equally spaced observations from your original time series, you can use the UCM procedure for such a task.  You might need to insert new observations with missing response values or delete some observations (such as holidays) to ensure that successive observations are "equally" spaced.  While doing this if the associated time ID variable cannot be assigned a proper date interval then you can just use the observation number as the time id.  This process should not require any interpolation (e.g., the use of PROC EXPAND).  Now you can try to explore the natural periodicities in your data, which might be different than 7 or 365.  Let me know how this works.

FaridNasrAlDeen
Fluorite | Level 6

Hello, dear rselukar!

I apologize for a late reply as I was busy trying ARDL modelling in SAS, that distracted my attention from PROC UCM. Thank You for the advice on using ID number as time id. I will try this option now. 

As to your suggestion to try and build equally spaced series by inserting new observations or deleting holidays, I am afraid it is quite difficult to achieve. Every year working days calendar is modified subject to the weekday the holiday falls on. For example, if Independance  falls on Thursday in a specific year, then there would be 4 days off (Thursday, Friday, Saturday and Sunday), but if in another year it falls on Tuesday, then there would be no vacation span all the way until Saturday and people would get only 1 day off (Tuesday). Therefore different years have slightly different length. 

My original data excludes public holidays and weekends. 

The problem that I see now is that if I use observation id as time id, I lose part of relevant info for ARIMA modelling that is a preliminary model estimation before proceeding to UCM. 

My average week lasts 4.79 days, month - 20.58 days, a year - 51.2 weeks or 245.6 days. this is also something that I will introduce into current UCM parameters. 

What is your assessment of the extent to which UCM estimates might be inconsistent due to standardized approach to time identification (i.e. assigning each observation a serial number, rather than real dates?

Is there a guide to building UCM by hand? I am not sure thar I understand the procedure of harmonics estimation. Understanding innerworkings of the model would help me customise it in accordance with certain tasks.   

rselukar
SAS Employee

I am trying to see how best to answer your questions.  Here are my comments:

1.  The ARIMA, UCM, AUTOREG (and VARMAX for multivariate setting) procedures in SAS assume that the observations are collected at (logically) equally spaced  time points.  Therefore, the actual index variable used internally is always the observation number.  The SAS time-ID variable, if supplied, is used only to label the observations (and to provide an additional check to see if the observations are properly ordered).  In particular, this means that my suggestion to create a time series of equally spaced observations applies to ARIMA as well as UCM modeling (and for your ARDL modeling also).

2.  I know it will be tedious to create a true equally spaced time series for your situation but it will be useful to come as close to it as easily possible (it is perfectly OK to have embedded missing response values if you are using PROC UCM or APROC ARIMA).

3.  Once you have such a time series, you are ready to use PROC UCM.  If you think that the series does not have seasonal pattern with integer period but has approximate periodic patterns then you can include one or more cycle components (start with one or two).  Start with a smooth trend (such as local linear trend with disturbance variance of level set to zero).  This helps in the identification of cycles.  You can also use regression variables to take account of the holidays or other special events.  Initially do not add ARMA orders in the IRREGULAR statement (ARMA component can act like a cycle component and complicate the cycle identification).  After reasonable cycle components are identified, you can add lower order ARMA part (say p=1 or q=1) to the IRREGULAR. 

 

Let's see if this works.

 

FaridNasrAlDeen
Fluorite | Level 6

Hello, rselukar!

Thank you for the reply. I still struggle to understand what you mean by "introducing new observations with missing values" to create equally spaced time series. Here is what I am dealing with. The raw data does not include observations for holidays and weekends. Stage 1. Here is an example:

29.04.19661928.04.18643728.04.175631
30.04.19663703.05.18638102.05.175586
06.05.19658304.05.18638903.05.175585
07.05.19662207.05.18638804.05.175634
08.05.19665008.05.18640705.05.175702
13.05.19661110.05.18643410.05.175669
14.05.19661411.05.18647011.05.175700
15.05.19663714.05.18647612.05.175750
16.05.19666515.05.18649315.05.175741

 

This is an extract from the series. The sample shows data for the period with may holidays. As You can see the holidays make it difficult to create equally spaced time series. 

 

Stage 2. To tackle the issue I tried to follow your advice by introducuing observations with missing values. Here is what I got

 

28.04.20175631,38428.04.20186437,38528.04.2019#NA
29.04.2017#NA29.04.2018#NA29.04.20196619,458
30.04.2017#NA30.04.2018#NA30.04.20196637,063
01.05.2017#NA01.05.2018#NA01.05.2019#NA
02.05.20175586,47402.05.2018#NA02.05.2019#NA
03.05.20175584,85303.05.20186381,41903.05.2019#NA
04.05.20175633,57604.05.20186389,25104.05.2019#NA
05.05.20175702,38305.05.2018#NA05.05.2019#NA
06.05.2017#NA06.05.2018#NA06.05.20196582,803
07.05.2017#NA07.05.20186388,14907.05.20196621,953
08.05.2017#NA08.05.20186407,09908.05.20196650,353
09.05.2017#NA09.05.2018#NA09.05.2019#NA
10.05.20175669,21210.05.20186434,30310.05.2019#NA
11.05.20175700,20311.05.20186470,1211.05.2019#NA
12.05.20175750,05212.05.2018#NA12.05.2019#NA
13.05.2017#NA13.05.2018#NA13.05.20196611,123
14.05.2017#NA14.05.20186476,37914.05.20196614,203
15.05.20175741,48515.05.20186492,97515.05.20196636,823

The original code works for the data but UCM forecast is poor from RMSE perspective. As an alternative I assumed there was no change of the dependent variable for the weekends.

 

Stage 3. So I copied the values of the working days previous to the days off. I could not find a code that does it in SAS, so I resorted to sumif function in excel:

27.04.17561127.04.18642227.04.196585
28.04.17563128.04.18642228.04.196585
29.04.17563129.04.18642229.04.196619
30.04.17563130.04.18641830.04.196637
01.05.17562401.05.18641801.05.196637
02.05.17558602.05.18638502.05.196637
03.05.17558503.05.18638103.05.196637
04.05.17563404.05.18638904.05.196637
05.05.17570205.05.18638905.05.196637
06.05.17570206.05.18638906.05.196583
07.05.17570207.05.18638807.05.196622
08.05.17569808.05.18640708.05.196650
09.05.17569809.05.18640709.05.196650
10.05.17566910.05.18643410.05.196650
11.05.17570011.05.18647011.05.196650
12.05.17575012.05.18647012.05.196650
13.05.17575013.05.18647013.05.196611
14.05.17575014.05.18647614.05.196614
15.05.17574115.05.18649315.05.196637

I ran proc UCM again but yet again I failed to improve my forecast. 

 

Here are my questions with respect to my current situation:

1) Does "introducing observations with missing values" stand for what I did in stage 2? 

2) What is the code for transforming original irregular date series into regular series with missing values?

3) Is there a code that would allow me to copy existing values of the working days previous to the days off? Anything similar to Excel sumif function in SAS?

4) Once you added holidays and weekends how do you assign index variable or SAS time-ID variable instead of the imported date variable? Please supply the code. I tried this: 

PROC IMPORT  
DATAFILE= "&dir.decomp_UCM.xlsx" DBMS=XLSX OUT= ttt REPLACE;
GETNAMES=YES;
RUN;
DATA ttt;
set ttt;
    LENGTH
        date               8
        cash           8 ;
    KEEP
        date
        cash ;
    FORMAT
        date             DATE9.
        cash        F12.4 ;
    INFORMAT
        date             DATE9.
        cash         BEST12. ;
RUN;
proc datasets library=work;
   modify ttt;
      index cash;
run;
PROC SORT
	DATA=ttt(KEEP=date cash_abs)
	OUT=ttt;
	BY date;
RUN;	

4) Suppose we have come to the point were the dataset is equally spaced. How do I specify the season and cycle?

The season parameter does not allow me to introduce numbers with decimals, requiring integers. You also mentioned that the cycle can be specified for both weekly and monthly patterns. Is the following code correct:

proc ucm data=ttt;
   id date interval=day;
   model cash;
   outlier maxnum=30;
   level plot=smooth;
   slope plot=smooth;
   season length=245.6 type=trig keeph=2 to 12 by 1 print=harmonics plot=(FILTER SMOOTH);
   cycle period=4.79 noest=(period);
   cycle period=20.58 noest=(period);
   estimate back=0 plot=panel;
   forecast skipfirst=3000 back=0 lead=&days_to_predict plot=decomp;
run;

Still the automatic UCM program yields unsatisfactory results. I am desperate to get a good approximation. Would you suggest trying Lex Jansen UCM procedure by hand? Meaning building and estimating state and signal equations without resorting to automatic solution? Would you recommend any literature on that?


@rselukar wrote:

I am trying to see how best to answer your questions.  Here are my comments:

1.  The ARIMA, UCM, AUTOREG (and VARMAX for multivariate setting) procedures in SAS assume that the observations are collected at (logically) equally spaced  time points.  Therefore, the actual index variable used internally is always the observation number.  The SAS time-ID variable, if supplied, is used only to label the observations (and to provide an additional check to see if the observations are properly ordered).  In particular, this means that my suggestion to create a time series of equally spaced observations applies to ARIMA as well as UCM modeling (and for your ARDL modeling also).

2.  I know it will be tedious to create a true equally spaced time series for your situation but it will be useful to come as close to it as easily possible (it is perfectly OK to have embedded missing response values if you are using PROC UCM or APROC ARIMA).

3.  Once you have such a time series, you are ready to use PROC UCM.  If you think that the series does not have seasonal pattern with integer period but has approximate periodic patterns then you can include one or more cycle components (start with one or two).  Start with a smooth trend (such as local linear trend with disturbance variance of level set to zero).  This helps in the identification of cycles.  You can also use regression variables to take account of the holidays or other special events.  Initially do not add ARMA orders in the IRREGULAR statement (ARMA component can act like a cycle component and complicate the cycle identification).  After reasonable cycle components are identified, you can add lower order ARMA part (say p=1 or q=1) to the IRREGULAR. 

 

Let's see if this works.

 


If You can please provide a code to illustrate your suggestions.

rselukar
SAS Employee

First a few comments about your UCM code:

1.  The length= in the season statement must be an integer.

2. Usually it is a good idea to include a simple noise component (IRREGULAR) in the model.

 

A good book for UCMs: Pelagatti, M. M. (2015). Time Series Modelling with Unobserved Components. Boca Raton, FL: CRC Press.

 

It is not easy for me to check your data pattern.  Try to see if your data can be put in some "weekday" interval pattern (see the section https://go.documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_intervals_toc.htm&docsetVersion=... ) supported by SAS.  If your holidays appear within these intervals, you will need to insert them in your data (with missing value for your response, close).  After this your series will be reasonably regular.  At least initially, don't specify periods in your cycles (let the procedure estimate the period).  Similarly, include the SEASON statement only if you have at least four complete seasons (why are you skipping the first harmonic?).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1603 views
  • 1 like
  • 2 in conversation