This task is going to derive a time-series variable, The rule is:
X(t) = 1.5 * X(t-1) - 0.5 * X(t-2) + e;
X(1) = e;
X(2) = 1.5 * X(1) + e;
where e follows a standard normal distribution. Here is how I generate this sequence:
data have;
call streaminit(42);
do i=1 to 200;
e=rand("normal",0,1);
output;
end;
run;
data want;
set have;
*Method 1;
x+sum(0.5*x,-0.5*lag(x),e);
*Method 2;
y+0.5*y-0.5*lag(y)+e;
*Method 3;
retain z;
if _n_=1 then z=e;
if _n_=2 then z=1.5*z+e;
lag2_z=lag2(z);
if _n_>2 then z=1.5*z-0.5*lag2_z+e;
run;
Run this code, I just find only result of method 1 is right. Why Method 2 and Method 3 are wrong? How to fix them?
Thank you for any hint.
With method 2, the problem is simply that the expression
y+0.5*y-0.5*lag(y)+e;
is to be understood as
y+(0.5*y-0.5*lag(y)+e);
which in the first iteration will set Y to 0 (the SUM statement, "Y+..." will initialize Y to 0, retain Y, and add whatever is after the plus sign, like
retain y 0;
y=sum(y,0.5*y-0.5*lag(y)+e);
But the second parameter evaluates to a missing value, meaning that Y stays 0 in the first iteration.
The method 1 calculation, on the other hand, is equivalent to
retain x 0;
x=sum(x,0.5*x,-0.5*lag(x),e);
which will set X to E on the first iteration.
In method 3, the problem is that you calculate Z before taking the LAG2 function the first two times, but after the LAG2 call the remaining times. Method 3 can be rewritten as e.g.
retain z;
lag2_z=lag(z);
if _n_=1 then z=e;
else if _n_=2 then z=1.5*z+e;
else z=1.5*z-0.5*lag2_z+e;
Note that I changed from LAG2 to LAG, as the LAG call now always comes first.
With method 2, the problem is simply that the expression
y+0.5*y-0.5*lag(y)+e;
is to be understood as
y+(0.5*y-0.5*lag(y)+e);
which in the first iteration will set Y to 0 (the SUM statement, "Y+..." will initialize Y to 0, retain Y, and add whatever is after the plus sign, like
retain y 0;
y=sum(y,0.5*y-0.5*lag(y)+e);
But the second parameter evaluates to a missing value, meaning that Y stays 0 in the first iteration.
The method 1 calculation, on the other hand, is equivalent to
retain x 0;
x=sum(x,0.5*x,-0.5*lag(x),e);
which will set X to E on the first iteration.
In method 3, the problem is that you calculate Z before taking the LAG2 function the first two times, but after the LAG2 call the remaining times. Method 3 can be rewritten as e.g.
retain z;
lag2_z=lag(z);
if _n_=1 then z=e;
else if _n_=2 then z=1.5*z+e;
else z=1.5*z-0.5*lag2_z+e;
Note that I changed from LAG2 to LAG, as the LAG call now always comes first.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.