Dear SAS Community,
I would appreciate your assistance with a question regarding the discrimination between two ARIMA models using a likelihood ratio test.
Suppose, the full model is as follows:
proc arima data=a; identify var=zt; estimate p=3 q=0 method=ml; run;
The restricted model is:
proc arima data=a;
identify var=zt;
estimate p=(1 3) q=0 method=ml;
run;
I understand that one approach is to manually compute the test statistic using the formula:
LR=2*(LL_Full−LL_Restricted) and then compare it with the chi-squared distribution with the appropriate degrees of freedom.
However, I was wondering if there is a more efficient or automated way to perform this test within SAS (for any arbitrary full vs restricted ARIMA models), rather than calculating it manually.
Any guidance or suggestions would be greatly appreciated!
Thank you in advance for your help.
Apologies. Try this one:
/* LR test: If 'Pr > ChiSq' value exceeds 0.05, the model with more parameters is preferred */
data LRTEST;
label LLFull="LL(Full)" LLRed="LL(Reduced)" DF="DF"
LR_Test='LR_Stat' pValue = 'Pr > ChiSq';
format pValue PVALUE6.4;
retain LLFull LLRed LR_Test DF pValue;
merge LLFULL(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueFull))
LLRED(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueRed));
if _STAT_ = "LOGLIK" then do;
LLFull = valueFull;
LLRed = valueRed;
LR_Test = 2 * abs(LLFull - LLRed);
end;
else if _STAT_ = "NPARMS" then do
DF = valueFull - valueRed;
pValue = sdf("ChiSq", LR_Test, DF);
output;
end;
drop _STAT_ valueFull valueRed;
run;
Dear Ksharp,
Thank you very much for the prompt and insightful reply. I've used the modified version of the code you referred to, and though my version perhaps is not the most efficient one, it seems to work. Do you think it can be accepted as a solution? Here is the code below.
/* ARIMA Model 1 (with more parameters) */ proc arima data=a plots=none; identify var=zt; estimate p=3 q=0 method=ml outstat=LLFull; run; /* ARIMA Model 2 (with fewer parameters) */ proc arima data=a plots=none; identify var=zt; estimate p=(1 3) q=0 method=ml outstat=LLRed; run;
/* NOTE: If 'Pr > ChiSq' value exceeds 0.05, the model with more parameters is preferred */
data LRTEST;
retain LR_Test;
merge LLFULL(keep=_VALUE_ rename=(_VALUE_=valueFull))
LLRED(keep=_VALUE_ rename=(_VALUE_=valueRed));
if _N_ = 3 then
LR_Test = 2 * abs(valueFull - valueRed);
if _N_ = 6 then do
DF = valueFull - valueRed;
pValue = sdf("ChiSq", LR_Test, DF);
end;
label LR_Test='LR_Stat' pValue = 'Pr > ChiSq';
drop valueFull valueRed DF;
run;
proc print data=LRTEST (firstobs=6 obs=6) label noobs;
run;
Your code looks good. Maybe @Rick_SAS could answer your question.
Your code looks correct. I suggest a few minor changes for readability and efficiency:
data A;
call streaminit(123);
do i = 1 to 100;
zt = rand("Normal");
output;
end;
run;
/* ARIMA Models */
proc arima data=a plots=none;
identify var=zt;
Full: estimate p=3 q=0 method=ml outstat=LLFull;
Reduced: estimate p=(1 3) q=0 method=ml outstat=LLRed;
run;
quit;
/* NOTE: If 'Pr > ChiSq' value exceeds 0.05, the model with more parameters is preferred */
data LRTEST;
label valueFull="LL(Full)" valueRed="LL(Reduced)" DF="DF"
LR_Test='LR_Stat' pValue = 'Pr > ChiSq';
format pValue PVALUE6.4;
retain LR_Test DF pValue;
merge LLFULL(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueFull))
LLRED(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueRed));
if _STAT_ = "LOGLIK" then
LR_Test = 2 * abs(valueFull - valueRed);
else if _STAT_ = "NPARMS" then do
DF = valueFull - valueRed;
pValue = sdf("ChiSq", LR_Test, DF);
output;
end;
drop _STAT_;
run;
proc print data=LRTEST label noobs;
run;
Thank you very much Rick_SAS, this is way better than mine. The only thing is, the printed table at the end doesn't display LL values (LOGLIK) in the 1st two columns, but apparently the number of paramaters in those model.
Apologies. Try this one:
/* LR test: If 'Pr > ChiSq' value exceeds 0.05, the model with more parameters is preferred */
data LRTEST;
label LLFull="LL(Full)" LLRed="LL(Reduced)" DF="DF"
LR_Test='LR_Stat' pValue = 'Pr > ChiSq';
format pValue PVALUE6.4;
retain LLFull LLRed LR_Test DF pValue;
merge LLFULL(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueFull))
LLRED(keep=_STAT_ _VALUE_ rename=(_VALUE_=valueRed));
if _STAT_ = "LOGLIK" then do;
LLFull = valueFull;
LLRed = valueRed;
LR_Test = 2 * abs(LLFull - LLRed);
end;
else if _STAT_ = "NPARMS" then do
DF = valueFull - valueRed;
pValue = sdf("ChiSq", LR_Test, DF);
output;
end;
drop _STAT_ valueFull valueRed;
run;