BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EJ2
Calcite | Level 5 EJ2
Calcite | Level 5

Hello. I'm using Visual forecasting.

While using atsm package in TSMODEL procedure, I wonder the difference 

 

  • diagnose.setOption('back', 6)
  • forecast.setOption('back',6)
  • diagnose.setOption('holdout', 6)
  • forecast.setOption('holdout',6)

Last time, i get answer <What is the difference of diagnose.setOption('holdout') and forecast.setOption('holdout')??> like this.

        

All options for diagnose object determine how to choose best models from model families; all options for forecast objects determine how to choose the best model from model selection list.

For example, in this example code, ARIMA and ESM model families are enabled, so the diagnose.setOption('holdout')  helps choose one model from ARIMA family and one model from ESM family. If you don't set this option explicitly, it will use the default value 0 for holdout. After you already have a list of models, the forecast object forecast.setOption('holdout') helps select the best model among this list. If you don't set this option explicitly, it will use the default value 0 for holdout when making this decision.

So in your case, when you want to use non-zero holdout, you might want to specify this options in both objects to be consistent.

 

But i'm still wondering these options and i want to know more.

 

  1. diagnose.setOption('back', 6) & forecast.setOption('back',6) Also Are these options difference same??
  2. When I use BACK or HOLDOUT Options, dose these options estimate Y or use other method of forecasting??
    1122.png

    I draw what i think about Back & Holdout , Is this alright?
  3. If I use ARIMAX model, does BACK Option estimate X or use other method of forecasting(eg. reflect X actual values...)??
  4. Also, does HOLDOUT Option estimate X or use other method of forecasting??

 

Thank u 🙂

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
imvash
SAS Employee

Yes. If you already have future values of X, they will be used. Otherwise, they will be extended using ESM (by default).

 

See the example below. I created a random dataset that spans from 1980Q1 to 1990Q4, and used it in proc tsmodel to forecast values of Y with holdout=15, back=6 and lead=20:

 

data mydata;
	call streaminit(123);
	do i = 1980 to 1990;
		do j = 1 to 4;
			date = yyq(i,j);
			x = exp(date/10000) + 0.2*rand("Uniform");
			y = 2*x+1;
			output;
		end;
	end;
	keep date x y;
	format date yyq6.;
run;

cas mycas;
libname mycas sasioca sessref = mycas;

data mycas.mydata;
	set mydata;
	if year(date)>1988 then y = .;
run;

proc tsmodel data = mycas.mydata
	outobj = ( 
	outfor = mycas.outfor 
	outstat = mycas.outstat
	outselect = mycas.outselect
	outmodelinfo = mycas.outmodelinfo
	outest = mycas.outest
	outindep = mycas.outindep
	)
	outlog = mycas.outlog;
	
	id date interval = quarter;
	var x y;
	require atsm;

	submit;
	declare object dataFrame(tsdf);
	declare object diagnose(diagnose);
	declare object diagspec(diagspec);
	declare object forecast(foreng);
	
	rc = dataFrame.initialize();
	rc = dataFrame.addY(y);
	rc = dataFrame.addx(x,"required", "yes", "extend", "STOCHASTIC");
	
	rc = diagspec.open();
	rc = diagspec.SetARIMAX('SIGLEVEL',0.00000001);
	rc = diagspec.SetTrend('DIFF', 'none','SDIFF','none');
	rc = diagspec.close();
	
	rc = diagnose.initialize(dataFrame);
	rc = diagnose.setSpec(diagspec);
	rc = diagnose.setOption('BACK', 6);
	rc = diagnose.setOption('HOLDOUT', 15);
	rc = diagnose.Run();

	rc = forecast.initialize(diagnose);
	rc = forecast.setOption('HORIZON', .);
	rc = forecast.setOption('LEAD', 20);
	rc = forecast.setOption('BACK', 6);
	rc = forecast.setOption('HOLDOUT', 15);
	rc = forecast.Run();
	
	declare object outfor(outfor);
	declare object outstat(outstat);
	declare object outselect(outselect);
	declare object outmodelInfo(outmodelinfo);
	declare object outest(outest);
	declare object outindep(outindep);
	
	rc = outfor.collect(forecast);
	rc = outstat.collect(forecast);
	rc = outselect.collect(forecast);
	rc = outmodelInfo.collect(forecast);
	rc = outest.collect(forecast);
	rc = outindep.collect(forecast);
	endsubmit;
quit;

 

In this example, values for X and Y are provided for 1980Q1 to 1990Q4 and 1980Q1 to 1988Q4, respectively. The values for independent variable X are first predicted for 1991Q1 to 1992Q2, then will be used to predict values of Y. The holdout region (has 15 data points) is 1983Q4 to 1987Q2 and back region (has 6 data points) is 1987Q3 to 1988Q4. The forecast region (has 20 data points) starts at 1987Q3 and ends at 1992Q2.

 

As illustrated in the example, you can get independent extended values through outindep object.

View solution in original post

4 REPLIES 4
imvash
SAS Employee

Suppose your historical data indices are 1,...,T and you specify both holdout (H) and back (B) options.

  • The last B data points [T-B+1,...,T] are considered out-sample region. It is used to evaluate performance of the model.
  • The holdout region, which is used for model selection, is [T-B-H+1,...,T-B].

In general, the first T-B-H data points are used to estimate parameters of different models. The best model for each family of ARIMA, ESM or UCM (the models you specified in DIAGSPEC) are then selected based on the performance of model against the holdout region and model selection criteria. Once the model is selected, the parameters are re-estimated using the first T-B data points. If you specify lead option as L, FORENG object computes the forecast values of L data points starting at T-B+1. That is, your forecast horizon will be [T-B+1,...,T-B+L]. If you want your forecast horizon to include the points up to T+L, you need to set lead as L+B.

 

As far as your question on independent variables, TSDF.AddX method extends your independent variables using the best suited exponential smoothing model (STOCHASTIC) by default, unless you provide the future values. You can override the STOCHASTIC option using rc = TSDF.AddX('EXTEND', value) command; see ATSM Documentation for valid values.

 

I hope this helps.

EJ2
Calcite | Level 5 EJ2
Calcite | Level 5

Thank u for your answer.

And I want to make sure your answer.

You said 

As far as your question on independent variables, TSDF.AddX method extends your independent variables using the best suited exponential smoothing model (STOCHASTIC) by default, unless you provide the future values. You can override the STOCHASTIC option using rc = TSDF.AddX('EXTEND', value) command;

 

Does these statements mean "If i provide the future values of X in ARIMAX, the model use the future values of X."??

 

Therefore, In the case of using Back option, because the model has the future values of independent values, the model use the future values of X and forecast Y,
And in the case of using Holdout option, having no future values of independent valuse, the model estimate X(using ESM model) and forecast Y, right????

imvash
SAS Employee

Yes. If you already have future values of X, they will be used. Otherwise, they will be extended using ESM (by default).

 

See the example below. I created a random dataset that spans from 1980Q1 to 1990Q4, and used it in proc tsmodel to forecast values of Y with holdout=15, back=6 and lead=20:

 

data mydata;
	call streaminit(123);
	do i = 1980 to 1990;
		do j = 1 to 4;
			date = yyq(i,j);
			x = exp(date/10000) + 0.2*rand("Uniform");
			y = 2*x+1;
			output;
		end;
	end;
	keep date x y;
	format date yyq6.;
run;

cas mycas;
libname mycas sasioca sessref = mycas;

data mycas.mydata;
	set mydata;
	if year(date)>1988 then y = .;
run;

proc tsmodel data = mycas.mydata
	outobj = ( 
	outfor = mycas.outfor 
	outstat = mycas.outstat
	outselect = mycas.outselect
	outmodelinfo = mycas.outmodelinfo
	outest = mycas.outest
	outindep = mycas.outindep
	)
	outlog = mycas.outlog;
	
	id date interval = quarter;
	var x y;
	require atsm;

	submit;
	declare object dataFrame(tsdf);
	declare object diagnose(diagnose);
	declare object diagspec(diagspec);
	declare object forecast(foreng);
	
	rc = dataFrame.initialize();
	rc = dataFrame.addY(y);
	rc = dataFrame.addx(x,"required", "yes", "extend", "STOCHASTIC");
	
	rc = diagspec.open();
	rc = diagspec.SetARIMAX('SIGLEVEL',0.00000001);
	rc = diagspec.SetTrend('DIFF', 'none','SDIFF','none');
	rc = diagspec.close();
	
	rc = diagnose.initialize(dataFrame);
	rc = diagnose.setSpec(diagspec);
	rc = diagnose.setOption('BACK', 6);
	rc = diagnose.setOption('HOLDOUT', 15);
	rc = diagnose.Run();

	rc = forecast.initialize(diagnose);
	rc = forecast.setOption('HORIZON', .);
	rc = forecast.setOption('LEAD', 20);
	rc = forecast.setOption('BACK', 6);
	rc = forecast.setOption('HOLDOUT', 15);
	rc = forecast.Run();
	
	declare object outfor(outfor);
	declare object outstat(outstat);
	declare object outselect(outselect);
	declare object outmodelInfo(outmodelinfo);
	declare object outest(outest);
	declare object outindep(outindep);
	
	rc = outfor.collect(forecast);
	rc = outstat.collect(forecast);
	rc = outselect.collect(forecast);
	rc = outmodelInfo.collect(forecast);
	rc = outest.collect(forecast);
	rc = outindep.collect(forecast);
	endsubmit;
quit;

 

In this example, values for X and Y are provided for 1980Q1 to 1990Q4 and 1980Q1 to 1988Q4, respectively. The values for independent variable X are first predicted for 1991Q1 to 1992Q2, then will be used to predict values of Y. The holdout region (has 15 data points) is 1983Q4 to 1987Q2 and back region (has 6 data points) is 1987Q3 to 1988Q4. The forecast region (has 20 data points) starts at 1987Q3 and ends at 1992Q2.

 

As illustrated in the example, you can get independent extended values through outindep object.

max64
Fluorite | Level 6

Hello. I'm using the ATSM package of the Proc TSMODEL. I would like to know if the EXTREME string of the PREFILTER option of the

DIAGSPEC.SetOption Method, is equivalent to the OUTLIER OPTION of the ARIMAX statement of the PROC HPF DIAGNOSE. Here below the 2 statements (the first for the PROC TSMODEL and the second for the PROC HPFDIAGNOSE).

 

Many thanks in advance

 

Proc tsmodel;

......

diagspec.setOption('PREFILTER','EXTREME');

......

quit;

 

Proc hpfdiagnose;

.....

arimax outlier=(detect=yes);

...

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1586 views
  • 2 likes
  • 3 in conversation