How do I specify an ar(1) term in my model using proc panel and pool OLS estimation?
Below is my code:
proc panel data=mypanel;
id msa year;
model L_RS = L_INC_POP L_POP_EMP L_EMP / BP artest=4 HCCME=3 pooled;
run;
I have attached a portion of my data in an excel doc as an example.
Two types of models in PROC PANEL accommodate autoregressive structure: one is the Parks method which estimates a first order autoregressive model with contemporaneous correlation, the other is the dynamic panel model which estimates autoregressive model with lagged dependent variables regressors. If you would like to specify an autoregressive model in PROC PANEL, you may want to choose either the PARKS method using PARKS option on the MODEL statement, or the dynamic panel estimator using DYNDIFF(for differenced GMM method) or DYNSYS(for system GMM method) options on the MODEL statement. Details on the PARKS method and the dynamic panel estimator can be found here:
Parks Method for Autoregressive Models(PARKS Option)
Dynamic Panel Estimation(DYNDIFF and DYNSYS Options)
However, if you do not want to use either the PARKS method or the dynamic panel estimator discussed above which accommodate autoregressive structure, but still want to use the pooled OLS instead, then the only thing you can do with pooled OLS model is to create a data set that contains the lagged dependent variable using the LAG statement in PROC PANEL, and include this lagged dependent variable as a regressor in the pooled OLS model. This is not specifying an AR(1) error structure in the model, it simply specifies a pooled OLS regression while one of the regressors is the lagged dependent variable.
Following example illustrates syntax for specifying the PARKS method, the dynamic panel model, as well as using pooled OLS by including a lagged dependent variable as regressor:
data Airline;
input Obs AirlineID T C Q PF LF;
Year = T + 1969;
lC = log(C);
lQ = log(Q);
lPF = log(PF);
label lC = "Log Transformation of Costs";
label lQ = "Log Transformation of Quantity";
label lPF = "Log Transformation of Price of Fuel";
label LF = "Load Factor (utilization index)";
datalines;
1 1 1 1140640 0.95276 106650 0.53449
2 1 2 1215690 0.98676 110307 0.53233
3 1 3 1309570 1.09198 110574 0.54774
4 1 4 1511530 1.17578 121974 0.54085
5 1 5 1676730 1.16017 196606 0.59117
6 1 6 1823740 1.17376 265609 0.57542
7 1 7 2022890 1.29051 263451 0.59450
8 1 8 2314760 1.39067 316411 0.59741
9 1 9 2639160 1.61273 384110 0.63852
10 1 10 3247620 1.82544 569251 0.67629
11 1 11 3787750 1.54604 871636 0.60574
12 1 12 3867750 1.52790 997239 0.61436
13 1 13 3996020 1.66020 938002 0.63337
14 1 14 4282880 1.82231 859572 0.65012
15 1 15 4748320 1.93646 823411 0.62560
16 2 1 569292 0.52064 103795 0.49085
17 2 2 640614 0.53463 111477 0.47345
18 2 3 777655 0.65519 118664 0.50301
19 2 4 999294 0.79158 114797 0.51250
20 2 5 1203970 0.84295 215322 0.56678
21 2 6 1358100 0.85289 281704 0.55813
22 2 7 1501350 0.92284 304818 0.55880
23 2 8 1709270 1.00000 348609 0.57207
24 2 9 2025400 1.19845 374579 0.62476
25 2 10 2548370 1.34067 544109 0.62871
26 2 11 3137740 1.32624 853356 0.58915
27 2 12 3557700 1.24852 1003200 0.53261
28 2 13 3717740 1.25432 941977 0.52665
29 2 14 3962370 1.37177 856533 0.54016
30 2 15 4209390 1.38974 821361 0.52878
31 3 1 286298 0.26242 118788 0.52433
32 3 2 309290 0.26643 123798 0.53719
33 3 3 342056 0.30604 122882 0.58212
34 3 4 374595 0.32559 131274 0.57949
35 3 5 450037 0.34571 222037 0.60659
36 3 6 510412 0.36752 278721 0.60727
37 3 7 575347 0.40994 306564 0.58243
38 3 8 669331 0.44802 356073 0.57397
39 3 9 783799 0.53960 378311 0.65426
40 3 10 913883 0.53938 555267 0.63106
41 3 11 1041520 0.46797 850322 0.56924
42 3 12 1125800 0.45054 1015610 0.58968
43 3 13 1096070 0.46879 954508 0.58795
44 3 14 1198930 0.49440 886999 0.56539
45 3 15 1170470 0.49332 844079 0.57708
46 4 1 145167 0.08639 114987 0.43207
47 4 2 170192 0.09674 120501 0.43967
48 4 3 247506 0.14150 121908 0.48893
49 4 4 309391 0.16972 127220 0.48418
50 4 5 354338 0.17381 209405 0.52993
51 4 6 373941 0.16427 263148 0.53272
52 4 7 420915 0.17091 316724 0.54907
53 4 8 474017 0.17784 363598 0.55714
54 4 9 532590 0.19225 389436 0.61138
55 4 10 676771 0.24247 547376 0.64532
56 4 11 880438 0.25651 850418 0.61173
57 4 12 1052020 0.24966 1011170 0.58088
58 4 13 1193680 0.27392 951934 0.57205
59 4 14 1303390 0.37113 881323 0.59457
60 4 15 1436970 0.42141 831374 0.58553
61 5 1 91361 0.05103 118222 0.44288
62 5 2 95428 0.05265 116223 0.46247
63 5 3 98187 0.05635 115853 0.51912
64 5 4 115967 0.06695 129372 0.52933
65 5 5 138382 0.07031 243266 0.55780
66 5 6 156228 0.07396 277930 0.55618
67 5 7 183169 0.08495 317273 0.56933
68 5 8 210212 0.09547 358794 0.58347
69 5 9 274024 0.11981 397667 0.63182
70 5 10 356915 0.15005 566672 0.60472
71 5 11 432344 0.14401 848393 0.58792
72 5 12 524294 0.16930 1005740 0.61616
73 5 13 530924 0.17276 958231 0.60587
74 5 14 581447 0.18667 872924 0.59469
75 5 15 610257 0.21328 844622 0.63555
76 6 1 68978 0.03768 117112 0.44854
77 6 2 74904 0.03978 119420 0.47589
78 6 3 83829 0.04433 116087 0.50056
79 6 4 98148 0.05025 122997 0.50034
80 6 5 118449 0.05505 194309 0.52890
81 6 6 133161 0.05246 307923 0.49536
82 6 7 145062 0.05698 323595 0.51034
83 6 8 170711 0.06149 363081 0.51830
84 6 9 199775 0.06903 386422 0.54672
85 6 10 276797 0.09275 564867 0.55428
86 6 11 381478 0.11264 874818 0.51777
87 6 12 506969 0.15415 1013170 0.58005
88 6 13 633388 0.18646 930477 0.55602
89 6 14 804388 0.24685 851676 0.53779
90 6 15 1009500 0.30401 819476 0.52578
;
proc sort data = Airline;
by AirlineID Year;
run;
/*Parks method*/
proc panel data = Airline;
id AirlineID Year;
model lC = lQ lPF LF / parks rho;
run;
/*Dynamic panel estimator(either DYNDIFF or DYNSYS option with default instruments)*/
proc panel data = Airline;
id AirlineID Year;
model lC = lQ lPF LF / dyndiff;
run;
proc panel data = Airline;
id AirlineID Year;
model lC = lQ lPF LF / dynsys;
run;
/*Pooled OLS with lagged dependent variable regressor*/
/*create lagged dependent variable using LAG statement*/
proc panel data = Airline;
id AirlineId Year;
lag lC(1) /out = A_lag;
run;
proc print data = A_lag ;
run;
/*remove the first year observation where lagged dependent variable is set to missing */
data Air_lag;
set A_lag;
if year = 1970 then delete;
run;
proc print data = Air_lag ;
run;
/*Pooled OLS with lagged dependent variable lC_1 included as a regressor*/
proc panel data = Air_lag;
id AirlineID Year;
model lC = lQ lPF LF lC_1/ pooled ;
run;
Please note that in the above example, the LAG statement is used to create lagged dependent variable where the lagged dependent variable for the first observation in each cross section is set to missing and removed from the pooled OLS estimation. You can also use alternative CLAG, SLAG, ZLAG, or XLAG statement to replace the missing lagged dependent variable with chosen value as discussed here if you wish:
Another note is that, in your original code, you specified ARTEST = option. This option is only valid for dynamic panel estimator. It will be ignored if you specify other estimation methods.
I hope this helps.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.