Dear SAS users,
I have a question regarding transfer function identification/estimation using the prewhitening approach in proc arima when both the response (Yt) and input (Xt) series are nonstationary but require different orders of differencing to achieve stationarity.
I came across an older article by Deddens and Ping (available here: https://support.sas.com/resources/papers/proceedings-archive/SUGI85/Sugi-10-15%20Deddens%20Ping.pdf), which suggests handling the CROSSCORR= option differently in such cases. However, I had a hard time to digest it and I'm unsure whether their concerns are still relevant under current SAS procedures.
The SAS ARIMA documentation states:
When the differencing lists specified in the VAR= option for an input and in the CROSSCORR= option for that input are not the same, PROC ARIMA combines the two lists so that the differencing operators used for prewhitening include all differences in either list (in the least common multiple sense).
I'm confused about what it means to “combine the lists” in the “least common multiple sense,” and whether this has anything to do with my questions stated below.
So, my main questions are:
If Yt and Xt require different differencing to achieve stationarity, what is the correct statement syntax at the identification stage: identify var=? crosscorr=?
How should this differ at the final estimation stage? For example, the above article states on p. 78: "For estimation purposes it may be necessary to add difference factors to the CC term or to delete some difference factors from the ID and CC terms, depending on the results of step 3)."
Any clarification or examples would be greatly appreciated.
Thanks a lot.
When you estimate the model for the input series and save the parameters, you do not take extra step to center the input series. The model is estimated exactly as what is specified ; The estimated ARMA parameters are those used in the prewhitening filter later. You save the ARMA parameter estimates from this step, not the residuals.
Centering is done only when the prewhitening filter is applied to the response and to the input, i.e., the prewhitening filter is applied to the centered response series and centered input series(or centered differenced series if differencing order is specified). You obtain residuals after applying the prewhiteneing filter to both the centered response and centered input.
To avoid confusions in the text, I will use this following example in the documentation to illustrate the steps to recompute cross correlations with prewhitening in PROC ARIMA. Note that the example specifies no differencing for either y or x in any of the statements, but if differencing orders are specified, the prewhitening steps are the same except the series need to be differenced before applying the prewhitening filter(you can create the differenced series and then treat the differenced series the same way as in the no differencing case).
SAS Help Center: Model for Series J Data from Box and Jenkins
The relevant code used in PROC ARIMA documentation example are the following:
proc arima data=seriesj; /*--- Look at the input process ----------------------------*/ identify var=x; /*--- Fit a model for the input ----------------------------*/ estimate p=3 plot; /*--- Cross-correlation of prewhitened series ---------------*/ identify var=y crosscorr=(x) nlag=12; run;
(1). When PROC ARIMA process the first IDENTIFY statement and ESTIMATE statement for the input x, the AR(3) model is fit to the x series directly. This is a standard IDENTIFY-ESTIMATE statement processing. The estimated parameters from this model will be used in the prewhitening filter later, the prewhitening filter used is shown in output 8.3.5:
Output 8.3.5: Prewhitening Filter
Autoregressive Factors | |
---|---|
Factor 1: | 1 - 1.97607 B**(1) + 1.37499 B**(2) - 0.34336 B**(3) |
So to reproduce the above step, you just specify the same IDENTIFY and ESTIMATE statement as above, and use OUTEST = option in the ESTIMATE statement to save the estimated AR parameters.
(2). Then in the next IDENTIFY statement, to compute the cross correlations, the series will be prewhitened using the prewhitening filter above. But before applying the filter, the series y and x will be centered, i.e., centered y = y - mean(y), and centered x = x - mean(x). Then the above prewhitening AR filter is applied to centered y and centered x, and resulting prewhitened series-- residuals are obtained.
To reproduce the above steps results yourself, you can fit AR(3) model with NOCONSTANT option to the centered y and centered x, but fix the AR parameters at the estimated AR parameters above using AR = option and NOEST option in the ESTIMATE statement, i.e., say you have created the centered x as cx, and centered y as cy in the data set,
proc arima data = ;
The cross correlations from the above step would be the same as those obtained in the PROC ARIMA steps in the documentation example.
*Note: In the case you specify differencing for x and y, for example, if you specify the following instead:
proc arima data=;
identify var=x(1);
estimate p = 3;
identify var=y(12) crosscorr=(x(1)) ;
then the only change you need to make in the above reproducing steps is, instead of creating the centered x and centered y shown above, you first create the difx = x(1), and dif12y =y(12), then center the difx and dif12y to obtain the two centered differenced series. All other steps follow the same logic as in the above example.
I hope this helps. Please let me know if you have further questions.
If you want to specify different differencing orders for response and input series, you can specify your desired differencing orders directly in the VAR = option and CROSSCORR = option in the IDENTIFY statement directly. If you specify an ARIMA model for the input prior to the IDENTIFY statement for the response variable with CROSSCORR option, then if you specify the same differencing order for the input as in the CROSSCORR = option, there should be no ambiguity in the differencing order applied to the input prior to prewhitening--the orders specified in the VAR = and CROSSCORR = option(also the same order as in the prior VAR = option in IDENTIFY statement for the input) in the IDENTIFY statement for the response will be the differencing orders used prior to prewhitening. The section in the documentation you referenced is only meant to explain what differencing order will be used prior to applying prewhitening filter in the event that the IDENTIFY statement for the input variable specifies different differencing order than the CROSSCORR = option in the IDENTFIY statement for the response.
If the IDENTIFY statement for the response variable with CROSSCORR = option is followed by an ESTIMATE statement with P = and/or Q = option, then the differencing orders used on the response and input variable during estimation stage are exactly those specified in the VAR = option and CROSSCORR = option in the IDENTIFY statement directly. I am not aware of discussions of context where you want to specify different differencing orders in estimation, you may research in the literature to see if you can find more detailed discussions; however, transfer function identification is a complicated process, and it may take some trials to decide on the appropriate model to fit to certain data, if after the identifying stage you decide from the results that you should specify different differencing orders to estimate, then you can always specify your final desired differencing orders in another IDENTIFY statement followed by ESTIMATE statement to get your final estimation results.
I hope this helps.
Dear SASCom1,
Thank you for your detailed reply and clarifications. I appreciate your guidance.
Unfortunately, I haven't yet found additional resources that clarify how SAS handles syntax when the original series Yt and Xt require different differencing orders to achieve stationarity. I’ll continue searching, but it would certainly help if SAS included more concrete examples on this in the documentation—particularly for users like myself who are still learning the details of transfer function modeling in ARIMA.
In the meantime, to ensure I’m correctly applying the prewhitening procedure as outlined in standard time series literature, I plan to manually cross-verify SAS’s automated prewhitening and cross-correlation results with the conventional step-by-step method for transfer function identification. Here’s the process I have in mind:
Step 1: Starting from the original series Xt and Yt, determine the differencing required to make each stationary. Let the differenced series be xt and yt, respectively.
Step 2: Fit an appropriate ARMA model (not ARIMA, as differencing has already been applied) to the input series xt. Estimate the model and obtain the residuals, denoted et_x.
Step 3: Apply the same ARMA model (from Step 2) to the differenced response series yt , and extract the corresponding residuals et_y.
Step 4: Compute and inspect the cross-correlation function (CCF) between et_y and et_x. This helps identify the structure of the transfer function (lags and delay).
Step 5: Use SAS's built-in prewhitening and cross-correlation functionality (via IDENTIFY with CROSSCORR) and compare the results with those from the manual procedure above.
Please let me know if this approach sounds appropriate or if you would suggest any corrections. Again, thank you for your support—your insights are very helpful.
Hi @sasalex2024
Your steps look fine, however there are some details in step 2 and step 3 that are not mentioned and I will add here, for example,
after you have estimated the ARMA model for the input, you save the ARMA model parameter estimates to be used when applying the prewhitening filter;
when applying the prewhitening filter to the response and input, you center the series(or center the differenced series if differencing order is specified) before applying the filter.
I hope this helps. Please let me know if you have further questions.
Hi SASCom1,
Many thanks for these very important clarifications! Based on them, I've updated the steps below. Are they correct now? Thank you.
Step 1: Starting from the original series Xt and Yt, determine the differencing required to make each stationary. Let the differenced series be xt and yt, respectively.
Step 2: Compute the sample mean of the differenced series xt and yt, then subtract these means to obtain zero-mean (centered) versions of the series. These are the series to which filtering will be applied in subsequent steps.
Step 3: Fit an appropriate ARMA model (not ARIMA, as differencing has already been applied) to the centered input series xt. Estimate the model and obtain the residuals, denoted et_x.
Step 4: Save the estimated AR/MA parameters from Step 3.
Step 5: Apply the inverse filter [phix(B)/thetax(B)] to the centered series yt , where phix(B) and thetax(B) are the AR and MA polynomials, respectively, from the model estimated in Step 3 with those saved AR/MA parameter values. This is to obtain et_y = [phix(B)/thetax(B)]*yt.
Step 6: Compute and inspect the cross-correlation function (CCF) between et_y and et_x. This helps identify the structure of the transfer function (lags and delay).
Step 7: Use SAS's built-in prewhitening and cross-correlation functionality (via IDENTIFY with CROSSCORR) and compare the results with those from the manual procedure above.
When you estimate the model for the input series and save the parameters, you do not take extra step to center the input series. The model is estimated exactly as what is specified ; The estimated ARMA parameters are those used in the prewhitening filter later. You save the ARMA parameter estimates from this step, not the residuals.
Centering is done only when the prewhitening filter is applied to the response and to the input, i.e., the prewhitening filter is applied to the centered response series and centered input series(or centered differenced series if differencing order is specified). You obtain residuals after applying the prewhiteneing filter to both the centered response and centered input.
To avoid confusions in the text, I will use this following example in the documentation to illustrate the steps to recompute cross correlations with prewhitening in PROC ARIMA. Note that the example specifies no differencing for either y or x in any of the statements, but if differencing orders are specified, the prewhitening steps are the same except the series need to be differenced before applying the prewhitening filter(you can create the differenced series and then treat the differenced series the same way as in the no differencing case).
SAS Help Center: Model for Series J Data from Box and Jenkins
The relevant code used in PROC ARIMA documentation example are the following:
proc arima data=seriesj; /*--- Look at the input process ----------------------------*/ identify var=x; /*--- Fit a model for the input ----------------------------*/ estimate p=3 plot; /*--- Cross-correlation of prewhitened series ---------------*/ identify var=y crosscorr=(x) nlag=12; run;
(1). When PROC ARIMA process the first IDENTIFY statement and ESTIMATE statement for the input x, the AR(3) model is fit to the x series directly. This is a standard IDENTIFY-ESTIMATE statement processing. The estimated parameters from this model will be used in the prewhitening filter later, the prewhitening filter used is shown in output 8.3.5:
Output 8.3.5: Prewhitening Filter
Autoregressive Factors | |
---|---|
Factor 1: | 1 - 1.97607 B**(1) + 1.37499 B**(2) - 0.34336 B**(3) |
So to reproduce the above step, you just specify the same IDENTIFY and ESTIMATE statement as above, and use OUTEST = option in the ESTIMATE statement to save the estimated AR parameters.
(2). Then in the next IDENTIFY statement, to compute the cross correlations, the series will be prewhitened using the prewhitening filter above. But before applying the filter, the series y and x will be centered, i.e., centered y = y - mean(y), and centered x = x - mean(x). Then the above prewhitening AR filter is applied to centered y and centered x, and resulting prewhitened series-- residuals are obtained.
To reproduce the above steps results yourself, you can fit AR(3) model with NOCONSTANT option to the centered y and centered x, but fix the AR parameters at the estimated AR parameters above using AR = option and NOEST option in the ESTIMATE statement, i.e., say you have created the centered x as cx, and centered y as cy in the data set,
proc arima data = ;
The cross correlations from the above step would be the same as those obtained in the PROC ARIMA steps in the documentation example.
*Note: In the case you specify differencing for x and y, for example, if you specify the following instead:
proc arima data=;
identify var=x(1);
estimate p = 3;
identify var=y(12) crosscorr=(x(1)) ;
then the only change you need to make in the above reproducing steps is, instead of creating the centered x and centered y shown above, you first create the difx = x(1), and dif12y =y(12), then center the difx and dif12y to obtain the two centered differenced series. All other steps follow the same logic as in the above example.
I hope this helps. Please let me know if you have further questions.
Dear SASCom1,
Many thanks for providing this solution and clarifications. I think I've understood you now (hopefully!). I've refined my manual general steps below, do they look correct now?
Step 1: From the original series Xt and Yt, determine whether a transformation is needed to stabilize the variance (e.g., a log transformation). If so, apply the transformation. Then, apply differencing (to the transformed series) as needed to achieve stationarity. Let the final resulting (transformed and differenced) series be xt and yt, respectively.
Step 2: Fit an appropriate ARMA model (not ARIMA, since differencing has already been applied) to the input series xt as-is, without manually centering it. The model includes a constant term by default (unless NOCONSTANT is specified). If the best-fitting ARMA model for xt should not have a constant, then in this step fit it with NOCONSTANT option. Regardless, save the estimated AR and MA parameters. This model will define the prewhitening filter.
Step 3: Center both xt and yt by subtracting their respective sample means. These centered series — call them c_xt and c_yt — will be used in the prewhitening step.
Step 4: Apply the ARMA model filter [phi_x(B)/theta_x(B)] (where B is the lag operator), using the parameters from Step 2, to both c_xt and c_yt. This is done without re-estimating the model and using the NOCONSTANT and NOEST options if coding manually in SAS. The filtered series are the prewhitened series ex_t and ey_t.
Step 5: Compute and inspect the cross-correlation function (CCF) between the prewhitened series ey_t and ex_t. This helps identify the lag structure and delay in the transfer function model.
Step 6: Use SAS’s built-in prewhitening and cross-correlation tools (e.g., IDENTIFY with CROSSCORR) as a cross-check against the above manual results.
yes your understanding of the steps looks correct to me. If you follow the steps but the resulting cross correlations computed do not match those from PROC ARIMA directly, please let me know.
Dear SASCom1,
Many thanks for the confirmation.
I just wanted to ask a couple of clarifying questions, if I may:
If one of the series is differenced and the other is not, the number of observations will not match in the end. For example, if the constructed residuals for series x have 100 observations, but for series y only 99 (due to differencing), then when computing the final cross-correlation, the first residual of x should be dropped, correct? This is to ensure that the lengths of the x and y residuals are aligned.
When constructing the ARMA model for the x series, if the constant term is found to be insignificant in that model, it's common to exclude it and specify the 'noconstant' option in the estimate statement. Whether the constant was suppressed or not at that stage, should the 'noconstant' option still be used when filtering the centered x series?Thank you!
1. If you use PROC ARIMA to compute cross correlations between the prewhitened series, the procedure should automatically exclude those with missing observations due to different differencing orders, so you do not need to manually delete them. But you can if you want to.
2. Whether or not you specify NOCONSTANT option in the ARMA model for the input, the 'NOCONSTANT' option is specified in the filtering stage because the series(or differenced series) have already been centered. However, if you choose to leave the constant to be estimated, the estimate of the constant is likely to be very close to zero since the series have been centered, so the impact may not be significant.
I hope this helps.
Thank you again, SAScom1, for the clarifications.
I've gone through the manual check-in procedure. Borrowing some codes from the SAS Guides, I generated series for Xt and Yt.
I apologize if the code below isn't very efficient, but I hope it gets the job done. In the final table (CFF_Final_Check), I present the "answers" (CORR_1 and CORR_2)—that is, the CCFs obtained using both the manual and automatic methods. They appear to be close in value, but not identical.
I may be making a mistake at some stage, which could explain the discrepancy. Could you please take a look? Thank you!
proc iml; phi = {1 -0.5}; theta = {1 0.8}; Ytseries = armasim(phi, theta, 125, 1, 100, -1234321); create Yt from Ytseries[colname={'Yt'}]; append from Ytseries; quit; data test; Xt = 100; nl = 0; al = 0; do i = 1 to 100; a = rannor(12345); n = 0.75 * nl + 0.5 * al + a; al = a; nl = n; z = n + 1; Xt = Xt + z; date = intnx('month', '1jan1988'd, i-1); format date monyy.; output; end; drop nl al i a n z; run; data test_new; retain date Xt; set test; run; data final; merge test_new yt; dXt=dif(Xt); run; proc sql noprint; select mean(Yt) into :meanYt from final; select mean(dXt) into :meandXt from final where dXt is not missing; quit; proc arima data=final; identify var=xt(1) noprint; estimate p=1 q=1 method=ml outest=A noprint; run; quit; data _null_; set A(obs=1); call symputx('ARxt', AR1_1); call symputx('MAxt',MA1_1); run; data final; set final; cYt=Yt-&meanYt; cdXt=dXt-&meandXt; run; proc print data=final; run; proc arima data=final(firstobs=2); identify var=cdXt noprint; estimate p=1 q=1 noconstant ar = &ARxt ma = &MAxt noest method=ml noprint; forecast out = outx(keep = residual) lead = 0; run; quit; proc arima data=final; identify var=cYt noprint; estimate p=1 q=1 noconstant ar = &ARxt ma = &MAxt noest method=ml noprint; forecast out = outy(keep = residual) lead = 0; run; quit; data outx; set outx; residual_x = residual; drop residual; run; data outy; set outy; if _N_ = 1 then delete; residual_y = residual; drop residual; run; data B; do i=1 to 99; date = intnx('month','1Feb1988'd,i-1); format date monyy.; output; end; drop i; run; data CCF; merge B outY outX; run; proc arima; identify var=residual_y crosscorr=residual_x outcov=CCF_1; run; quit; proc arima data=final; identify var=Xt(1) noprint; estimate p=1 q=1 noprint; identify var=Yt crosscorr=Xt(1) outcov=CCF_2; run; quit; data CCF_1; set ccf_1; CORR_1=corr; drop lag var crossvar N cov stderr invcorr partcorr corr; run; data CCF_2; set ccf_2; CORR_2=corr; drop corr var crossvar N cov stderr invcorr partcorr; run; data CFF_Final_Check; merge CCF_2 CCF_1; if _N_ <= 26 then delete; run;
In your manual computation, the ARMA parameters used is from method = ML , while in the PROC ARIMA step directly, the method used is default method = CLS. To avoid the problem of using different parameters in manual computation, you may save the parameters from the direct PROC ARIMA step without estimating the model in another step. Also, regardless of which estimation method is used to obtain the ARMA prewhitening filter, you may want to use default method = CLS when applying the filter to the response and input series in the manual computation.
Here is modified program that recomputes the cross correlations with prewhitening in PROC ARIMA following the steps outlined earlier. The recomputed cross correlations match those from PROC ARIMA directly.
Dear SASCom1,
Many thanks for your clarifications — they helped a lot!
I'm also curious why enabling method=ml in all the above PROC ARIMA procedures causes the results to differ slightly, whereas using method=cls everywhere makes the results of the manual and direct methods identical.
You can specify any estimation method when fitting the ARIMA model to the input to determine the ARMA parameters used in the prewhitening filter, but when you compute the prewhitened series manually, residuals computed using METHOD = CLS match those obtained from applying the prewhitening filter, hence the final cross correlations will match. Different estimation methods would result in different forecasts hence different residuals.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!