BookmarkSubscribeRSS Feed
baladim
Calcite | Level 5

Hi to All,

Here is my question.  When I run proc ARIMA, the results of the PACF are not consistent with the PACF results using the regression alternatively. Please find below the results of both procedures.  As you see the ARIMA PACF shows lag 5 of PACF is insignificant.  But running the regression of the variable Passengers on its lagged values such as: 

 

proc reg data=WORK.Airline ;

model Passengers = Passengers1 Passengers2 Passengers3 Passengers4 Passengers5;

run;

 

where Passengers is the variable of interest and Passengers5 is the fifth lagged value, it shows the coefficient of Passengers5 is significant and equal to .33259.  This coefficient must be the same as the one shown in the ARIMA plot for PACF and as you see they are not equal. Your help would be greatly appreciated. 

The full code is below. 

Thanks,

Mostafa

 

Parameter Estimates 

Variable 

Label 

DF 

Parameter
Estimate 

Standard
Error 

t Value 

Pr > |t| 

Intercept 

Intercept 

8.90874 

7.78030 

1.15 

0.2554 

Passengers1 

 

1.25794 

0.10476 

12.01 

<.0001 

Passengers2 

 

-0.55785 

0.17133 

-3.26 

0.0016 

Passengers3 

 

0.25541 

0.17871 

1.43 

0.1566 

Passengers4 

 

-0.31429 

0.17258 

-1.82 

0.0721 

Passengers5 

 

0.33259 

0.11256 

2.95 

0.0040 

 

 
 
 
 
 

baladim_11-1617994592093.png

FILENAME REFFILE '/home/u50399514/New Folder10/Airline_data.xlsx';

options validvarname=v7;

PROC IMPORT DATAFILE=REFFILE
        DBMS=XLSX
        OUT=WORK.IMPORT;
        GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.IMPORT;
RUN;

libname aa "/home/u50399514/New Folder10/";

Data WORK.Airline;
        set work.import;
        run;
       
proc sgplot data=WORK.Airline;
        series x= period y=passengers;  
        run;
       
Data WORK.Airline;
        set WORK.Airline;
               
Passengers1 =lag1(Passengers);
Passengers2 =lag2(Passengers);
Passengers3 =lag3(Passengers);
Passengers4 =lag4(Passengers);
Passengers5 =lag5(Passengers);
Passengers6 =lag6(Passengers);
Passengers7 =lag7(Passengers);
Passengers8 =lag8(Passengers);
Passengers9 =lag9(Passengers);
Passengers10 =lag10(Passengers);
run;

proc reg data=WORK.Airline ;
model Passengers = Passengers1  ;
run;
 
proc reg data=WORK.Airline ;
model Passengers = Passengers1 Passengers2 ;
run;

proc reg data=WORK.Airline ;
model Passengers = Passengers1 Passengers2 Passengers3 ;
run;

proc reg data=WORK.Airline ;
model Passengers = Passengers1 Passengers2 Passengers3 Passengers4;
run;

proc reg data=WORK.Airline ;
model Passengers = Passengers1 Passengers2 Passengers3 Passengers4 Passengers5;
run;

proc reg data=WORK.Airline ;
model Passengers = Passengers1 Passengers2 Passengers3 Passengers4 Passengers5 Passengers6;
run;


proc ARIMA data=WORK.Airline ; 
identify var=Passengers;
run;

 

 

 

 

 

 

 

 

 

 

 

 

 

9 REPLIES 9
sbxkoenk
SAS Super FREQ

Hello,

 

To me, you are wrongly comparing significance of partial autocorrelations with significance of lagged values of an explanatory factor in a regular least squares (or maximum likelihood) SAS/Stat regression.

The partial autocorrelation plot can indeed suggest that the data can be modeled with a x(th)-order autoregressive model, commonly referred to as an AR(x) model, but PROC REG cannot fit autoregressive models.

There's a reason to use PROC ARIMA (your data points are interdependent /  auto-correlated) or any other SAS/ETS procedure. When dealing with time-series, PROC REG no longer qualifies.

There's a section 'Autocorrelation in Time Series Data' in the Details tab of the PROC REG documentation. Look at that!

 

Good luck,

Koen

 

baladim
Calcite | Level 5

Hi Koen,

Thank you very much for your response.  Appreciate it.  This method of estimating the partial autocorrelation is correct and based on regressing the variable on its own lagged values and it is proved theoretically that the coefficient of the kth lagged value is equal to the partial autocorrelation for lag k.  Please see the book Forecasting Methods and Applications by Makridakis, Wheelwright, and Hyndman, 1998, page 321.  This is not just a proc reg, it is regressing the variable on its lagged values, and for each partial autocorrelation, you need a different regression. 

Once again thank you for your attention.

Best,

Mostafa

sbxkoenk
SAS Super FREQ

Hello @baladim ,

 

Thank you for your clear explanation. Sorry, I was not aware of this. But it does make sense if I think it over (and of course, as you say, you need a different regression for each partial autocorrelation).

I also have the book from Makridakis et al. that you mention so I can have a look there as well for full comprehension.

 

In that case, the difference is probably due to different (n° of) observations being used.

I do not have time right now (today) but if the question is still open tomorrow I will run some tests with a PROC COMPARE at the end to compare results from ARIMA and REG (on lagged values).

 

Cheers,

Koen

 

baladim
Calcite | Level 5

Hi Koen,

I would appreciate you looking into this.  The differences sometimes are very drastic.  For example, one is completely significant and the other one is not.  I look forward to hearing from you when you have time.

Thanks,

Mostafa

 

sbxkoenk
SAS Super FREQ

Hello @baladim ,

OK. I fished the MAKRIDAKIS book out of my library and will do an investigation this afternoon (Brussels time).

Stay tuned!

I have to add: the testing and quality assurance of SAS procedures is very detailed, very elaborate and of very high quality overall. And moreover, the ARIMA procedure is very old (some decades), thus some decades of user-testing are added on top.

That's why I am sure you are (we are) overlooking something. I have no doubt PROC ARIMA is correct. But your question deserves investigation of course.

More info to come this afternoon.

Koen

 

sbxkoenk
SAS Super FREQ

Hello,

 

My MAKRIDAKIS book is much too old (from the eighties), so I cannot locate the section you mention (it's not on the same page-number for sure).

I made following little program and can indeed see that for lag 6 both estimates are not equal at all (not even near).

I took lag 6 as an example but you can modify the program to investigate the same for any lag.

I will investigate further end-of-afternoon (Brussels time).

QUIT;

ods trace off;
ods output SeriesCorrPanel=work.SeriesCorrPanel;
ods exclude DescStats ChiSqAuto;
proc arima data=sashelp.citimon /*plots=none*/;
 identify var=EXVUS;
run;
QUIT;
data work.SeriesCorrPanel;
 set work.SeriesCorrPanel;
 keep  LAGS_SERIES_MAX_NLAGS_ PACF_SERIES_NLAGS_NLAGS_ TIME;
 where LAGS_SERIES_MAX_NLAGS_=6;
run;

data work.for_proc_reg(keep=date EXVUS lag:);
 set sashelp.citimon;
 *lag0_EXVUS=EXVUS;
 lag1_EXVUS=lag1(EXVUS);
 lag2_EXVUS=lag2(EXVUS);
 lag3_EXVUS=lag3(EXVUS);
 lag4_EXVUS=lag4(EXVUS);
 lag5_EXVUS=lag5(EXVUS);
 lag6_EXVUS=lag6(EXVUS);
run;

ods output ParameterEstimates=work.REG_ParameterEstimates;
ods exclude NObs ANOVA FitStatistics;
proc reg data=work.for_proc_reg plots=none;
 model EXVUS = lag1_EXVUS lag2_EXVUS lag3_EXVUS
               lag4_EXVUS lag5_EXVUS lag6_EXVUS;
run;
QUIT;
data work.REG_ParameterEstimates;
 set work.REG_ParameterEstimates; 
 where variable='lag6_EXVUS';
run;

data work.merge_both_ds(drop=label);
 LENGTH PACF_SERIES_NLAGS_NLAGS_ Estimate 8;
 merge work.SeriesCorrPanel
       work.REG_ParameterEstimates;
run;
/* end of program */

Cheers,

Koen

 

sbxkoenk
SAS Super FREQ

Hello,

I added the auto-regression by PROC ARIMA. The last coefficient of the AR(6) model is

  • not equal to the partial autocorrelation coefficient (PACF) at lag 6 in the plot
  • is about equal to the estimate of lag6_x in a PROC REG (small diff. due to other parameter estimation algorithm, sign difference due to parameter being at the other end of the equal sign).

I cannot investigate any further. No time left.

If nobody picks this up (everybody can easily submit my program as it uses a SASHELP dataset everyone had available), I will ask our Technical Support department tomorrow.

QUIT;

ods trace off;
ods output SeriesCorrPanel=work.SeriesCorrPanel;
ods exclude DescStats ChiSqAuto;
proc arima data=sashelp.citimon /*plots=none*/;
 identify var=EXVUS;
run;
QUIT;
data work.SeriesCorrPanel;
 set work.SeriesCorrPanel;
 keep  LAGS_SERIES_MAX_NLAGS_ PACF_SERIES_NLAGS_NLAGS_ TIME;
 where LAGS_SERIES_MAX_NLAGS_=6;
run;

data work.for_proc_reg(keep=date EXVUS lag:);
 set sashelp.citimon;
 *lag0_EXVUS=EXVUS;
 lag1_EXVUS=lag1(EXVUS);
 lag2_EXVUS=lag2(EXVUS);
 lag3_EXVUS=lag3(EXVUS);
 lag4_EXVUS=lag4(EXVUS);
 lag5_EXVUS=lag5(EXVUS);
 lag6_EXVUS=lag6(EXVUS);
run;

ods output ARPolynomial=work.ARPolynomial;
ods exclude
ods exclude DescStats ChiSqAuto
ParameterEstimates FitStatistics          CorrB
ResidualCorrPanel  ResidualNormalityPanel ModelDescription;
proc arima data=sashelp.citimon /*plots=none*/;
 identify var=EXVUS; run;
 estimate p=6;       run;
QUIT;
data work.ARPolynomial; set work.ARPolynomial; where Term='B**(6)'; run;

ods output ParameterEstimates=work.REG_ParameterEstimates;
ods exclude NObs ANOVA FitStatistics;
proc reg data=work.for_proc_reg plots=none;
 model EXVUS = lag1_EXVUS lag2_EXVUS lag3_EXVUS
               lag4_EXVUS lag5_EXVUS lag6_EXVUS;
run;
QUIT;
data work.REG_ParameterEstimates;
 set work.REG_ParameterEstimates; 
 where variable='lag6_EXVUS';
run;

data work.merge_three_ds;
 LENGTH PACF_SERIES_NLAGS_NLAGS_ Estimate Coefficient 8;
 merge work.SeriesCorrPanel       (keep=PACF_SERIES_NLAGS_NLAGS_)
       work.REG_ParameterEstimates(keep=Estimate)
       work.ARPolynomial          (keep=Coefficient);        
run;
/* end of program */

Cheers,

Koen

sbxkoenk
SAS Super FREQ

Hello @baladim and _ALL_,

This question was solved by investigation through Technical Support.

To be short on the conclusion: PACF formula in PROC ARIMA is correct. The reason on why you do not obtain exactly the same results as with a regression approach is the (different n° of) observations used (as I suggested in my posts in this thread). For example, if you take the lag of x, you don't have the full sample anymore.

PACF is using the full length n of the series, so more observations, other denominator(s).  

Cheers,

Koen

sbxkoenk
SAS Super FREQ

Hello @baladim ,

The Technical Support track was initiated by you. So you were the first to get the full response (including test program).

I just explain this to clarify why I only post the summary in this thread and not the full, elaborate response from TS.

Koen

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1424 views
  • 0 likes
  • 2 in conversation