Solved: Logic error of cumulative statement and lag function

whymath · Posted 06-08-2023 12:33 AM

This task is going to derive a time-series variable, The rule is:

X(t) = 1.5 * X(t-1) - 0.5 * X(t-2) + e;
X(1) = e;
X(2) = 1.5 * X(1) + e;

where e follows a standard normal distribution. Here is how I generate this sequence:

data have;
  call streaminit(42);
  do i=1 to 200;
    e=rand("normal",0,1);
    output;
  end;
run;

data want;
  set have;

  *Method 1;
  x+sum(0.5*x,-0.5*lag(x),e);

  *Method 2;
  y+0.5*y-0.5*lag(y)+e;

  *Method 3;
  retain z;
  if _n_=1 then z=e;
  if _n_=2 then z=1.5*z+e;

  lag2_z=lag2(z);
  if _n_>2 then z=1.5*z-0.5*lag2_z+e;
run;

Run this code, I just find only result of method 1 is right. Why Method 2 and Method 3 are wrong? How to fix them?

Thank you for any hint.

s_lassen · Posted 06-08-2023 02:15 AM

With method 2, the problem is simply that the expression

y+0.5*y-0.5*lag(y)+e;

is to be understood as

y+(0.5*y-0.5*lag(y)+e);

which in the first iteration will set Y to 0 (the SUM statement, "Y+..." will initialize Y to 0, retain Y, and add whatever is after the plus sign, like

retain y 0;
y=sum(y,0.5*y-0.5*lag(y)+e);

But the second parameter evaluates to a missing value, meaning that Y stays 0 in the first iteration.

The method 1 calculation, on the other hand, is equivalent to

retain x 0;
x=sum(x,0.5*x,-0.5*lag(x),e);

which will set X to E on the first iteration.

In method 3, the problem is that you calculate Z before taking the LAG2 function the first two times, but after the LAG2 call the remaining times. Method 3 can be rewritten as e.g.

  retain z;

  lag2_z=lag(z);

  if _n_=1 then z=e;
  else if _n_=2 then z=1.5*z+e;
  else z=1.5*z-0.5*lag2_z+e;

Note that I changed from LAG2 to LAG, as the LAG call now always comes first.

View solution in original post

s_lassen · Posted 06-08-2023 02:15 AM

With method 2, the problem is simply that the expression

y+0.5*y-0.5*lag(y)+e;

is to be understood as

y+(0.5*y-0.5*lag(y)+e);

which in the first iteration will set Y to 0 (the SUM statement, "Y+..." will initialize Y to 0, retain Y, and add whatever is after the plus sign, like

retain y 0;
y=sum(y,0.5*y-0.5*lag(y)+e);

But the second parameter evaluates to a missing value, meaning that Y stays 0 in the first iteration.

The method 1 calculation, on the other hand, is equivalent to

retain x 0;
x=sum(x,0.5*x,-0.5*lag(x),e);

which will set X to E on the first iteration.

In method 3, the problem is that you calculate Z before taking the LAG2 function the first two times, but after the LAG2 call the remaining times. Method 3 can be rewritten as e.g.

  retain z;

  lag2_z=lag(z);

  if _n_=1 then z=e;
  else if _n_=2 then z=1.5*z+e;
  else z=1.5*z-0.5*lag2_z+e;

Note that I changed from LAG2 to LAG, as the LAG call now always comes first.

whymath · Posted 06-09-2023 02:03 AM

Thank you very much, very detailed.

Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

Re: Logic error of cumulative statement and lag function

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away