Hello, rselukar! Thank you for the reply. I still struggle to understand what you mean by "introducing new observations with missing values" to create equally spaced time series. Here is what I am dealing with. The raw data does not include observations for holidays and weekends. Stage 1. Here is an example: 29.04.19 6619 28.04.18 6437 28.04.17 5631 30.04.19 6637 03.05.18 6381 02.05.17 5586 06.05.19 6583 04.05.18 6389 03.05.17 5585 07.05.19 6622 07.05.18 6388 04.05.17 5634 08.05.19 6650 08.05.18 6407 05.05.17 5702 13.05.19 6611 10.05.18 6434 10.05.17 5669 14.05.19 6614 11.05.18 6470 11.05.17 5700 15.05.19 6637 14.05.18 6476 12.05.17 5750 16.05.19 6665 15.05.18 6493 15.05.17 5741 This is an extract from the series. The sample shows data for the period with may holidays. As You can see the holidays make it difficult to create equally spaced time series. Stage 2. To tackle the issue I tried to follow your advice by introducuing observations with missing values. Here is what I got 28.04.2017 5631,384 28.04.2018 6437,385 28.04.2019 #NA 29.04.2017 #NA 29.04.2018 #NA 29.04.2019 6619,458 30.04.2017 #NA 30.04.2018 #NA 30.04.2019 6637,063 01.05.2017 #NA 01.05.2018 #NA 01.05.2019 #NA 02.05.2017 5586,474 02.05.2018 #NA 02.05.2019 #NA 03.05.2017 5584,853 03.05.2018 6381,419 03.05.2019 #NA 04.05.2017 5633,576 04.05.2018 6389,251 04.05.2019 #NA 05.05.2017 5702,383 05.05.2018 #NA 05.05.2019 #NA 06.05.2017 #NA 06.05.2018 #NA 06.05.2019 6582,803 07.05.2017 #NA 07.05.2018 6388,149 07.05.2019 6621,953 08.05.2017 #NA 08.05.2018 6407,099 08.05.2019 6650,353 09.05.2017 #NA 09.05.2018 #NA 09.05.2019 #NA 10.05.2017 5669,212 10.05.2018 6434,303 10.05.2019 #NA 11.05.2017 5700,203 11.05.2018 6470,12 11.05.2019 #NA 12.05.2017 5750,052 12.05.2018 #NA 12.05.2019 #NA 13.05.2017 #NA 13.05.2018 #NA 13.05.2019 6611,123 14.05.2017 #NA 14.05.2018 6476,379 14.05.2019 6614,203 15.05.2017 5741,485 15.05.2018 6492,975 15.05.2019 6636,823 The original code works for the data but UCM forecast is poor from RMSE perspective. As an alternative I assumed there was no change of the dependent variable for the weekends. Stage 3. So I copied the values of the working days previous to the days off. I could not find a code that does it in SAS, so I resorted to sumif function in excel: 27.04.17 5611 27.04.18 6422 27.04.19 6585 28.04.17 5631 28.04.18 6422 28.04.19 6585 29.04.17 5631 29.04.18 6422 29.04.19 6619 30.04.17 5631 30.04.18 6418 30.04.19 6637 01.05.17 5624 01.05.18 6418 01.05.19 6637 02.05.17 5586 02.05.18 6385 02.05.19 6637 03.05.17 5585 03.05.18 6381 03.05.19 6637 04.05.17 5634 04.05.18 6389 04.05.19 6637 05.05.17 5702 05.05.18 6389 05.05.19 6637 06.05.17 5702 06.05.18 6389 06.05.19 6583 07.05.17 5702 07.05.18 6388 07.05.19 6622 08.05.17 5698 08.05.18 6407 08.05.19 6650 09.05.17 5698 09.05.18 6407 09.05.19 6650 10.05.17 5669 10.05.18 6434 10.05.19 6650 11.05.17 5700 11.05.18 6470 11.05.19 6650 12.05.17 5750 12.05.18 6470 12.05.19 6650 13.05.17 5750 13.05.18 6470 13.05.19 6611 14.05.17 5750 14.05.18 6476 14.05.19 6614 15.05.17 5741 15.05.18 6493 15.05.19 6637 I ran proc UCM again but yet again I failed to improve my forecast. Here are my questions with respect to my current situation: 1) Does "introducing observations with missing values" stand for what I did in stage 2? 2) What is the code for transforming original irregular date series into regular series with missing values? 3) Is there a code that would allow me to copy existing values of the working days previous to the days off? Anything similar to Excel sumif function in SAS? 4) Once you added holidays and weekends how do you assign index variable or SAS time-ID variable instead of the imported date variable? Please supply the code. I tried this: PROC IMPORT
DATAFILE= "&dir.decomp_UCM.xlsx" DBMS=XLSX OUT= ttt REPLACE;
GETNAMES=YES;
RUN;
DATA ttt;
set ttt;
LENGTH
date 8
cash 8 ;
KEEP
date
cash ;
FORMAT
date DATE9.
cash F12.4 ;
INFORMAT
date DATE9.
cash BEST12. ;
RUN;
proc datasets library=work;
modify ttt;
index cash;
run;
PROC SORT
DATA=ttt(KEEP=date cash_abs)
OUT=ttt;
BY date;
RUN; 4) Suppose we have come to the point were the dataset is equally spaced. How do I specify the season and cycle? The season parameter does not allow me to introduce numbers with decimals, requiring integers. You also mentioned that the cycle can be specified for both weekly and monthly patterns. Is the following code correct: proc ucm data=ttt;
id date interval=day;
model cash;
outlier maxnum=30;
level plot=smooth;
slope plot=smooth;
season length=245.6 type=trig keeph=2 to 12 by 1 print=harmonics plot=(FILTER SMOOTH);
cycle period=4.79 noest=(period);
cycle period=20.58 noest=(period);
estimate back=0 plot=panel;
forecast skipfirst=3000 back=0 lead=&days_to_predict plot=decomp;
run; Still the automatic UCM program yields unsatisfactory results. I am desperate to get a good approximation. Would you suggest trying Lex Jansen UCM procedure by hand? Meaning building and estimating state and signal equations without resorting to automatic solution? Would you recommend any literature on that? @rselukar wrote: I am trying to see how best to answer your questions. Here are my comments: 1. The ARIMA, UCM, AUTOREG (and VARMAX for multivariate setting) procedures in SAS assume that the observations are collected at (logically) equally spaced time points. Therefore, the actual index variable used internally is always the observation number. The SAS time-ID variable, if supplied, is used only to label the observations (and to provide an additional check to see if the observations are properly ordered). In particular, this means that my suggestion to create a time series of equally spaced observations applies to ARIMA as well as UCM modeling (and for your ARDL modeling also). 2. I know it will be tedious to create a true equally spaced time series for your situation but it will be useful to come as close to it as easily possible (it is perfectly OK to have embedded missing response values if you are using PROC UCM or APROC ARIMA). 3. Once you have such a time series, you are ready to use PROC UCM. If you think that the series does not have seasonal pattern with integer period but has approximate periodic patterns then you can include one or more cycle components (start with one or two). Start with a smooth trend (such as local linear trend with disturbance variance of level set to zero). This helps in the identification of cycles. You can also use regression variables to take account of the holidays or other special events. Initially do not add ARMA orders in the IRREGULAR statement (ARMA component can act like a cycle component and complicate the cycle identification). After reasonable cycle components are identified, you can add lower order ARMA part (say p=1 or q=1) to the IRREGULAR. Let's see if this works. If You can please provide a code to illustrate your suggestions.
... View more