Hi,
I am using a do loop to create lag variable. The point is I have to use the last observation's data to compute the lagged variable for the next observation. I try the following:
data data.coffee; set data.coffee; by PANID; if first.PANID then NO = 0; NO + 1; array brandloy(6) brandloy1 - brandloy6; array lag_brandloy(6) lag_brandloy1 - lag_brandloy6; do i = 1 to 6; if NO = 1 then brandloy(i) = 0.04; if NO = 1 and brandID = i then brandloy(i) = 0.8; do k = 2 to 160; lag_brandloy(i) = lag(brandloy(i)); if NO = k and brandloy(i)= . then brandloy(i) = sum(0.8*lag_brandloy(i),0.2*brand_lastindicator); end; end; run;
But the result is not what I want.
The lag does not work here.
If I do it one by one like this:
if NO = 2 then brandloy(i) = 0.8*lag_brandloy(i) + 0.2*brand_lastindicator; lag_brandloy(i) = lag(brandloy(i)); if NO = 3 then brandloy(i) = 0.8*lag_brandloy(i) + 0.2*brand_lastindicator; lag_brandloy(i) = lag(brandloy(i)); if NO = 4 then brandloy(i) = 0.8*lag_brandloy(i) + 0.2*brand_lastindicator; lag_brandloy(i) = lag(brandloy(i));
the result is correct. So I wonder why the loop code does not work. And any suggestions about what I should do?
Thanks!
As you have seen, the LAG function is more complex than it looks. For your application, I would recommend skipping it entirely. You can still get the results you want through this combination of steps:
(a) compute the lagged values before computing the unlagged values
(b) retain the unlagged values
For example:
data data.coffee;
set data.coffee;
by panid;
if first.PANID then NO=0;
NO + 1;
array brandloy (6) brandloy1 - brandloy6;
array lag_brandloy (6) lag_brandloy1-lag_brandloy_6;
So far, no changes. But now the idea is this:
retain brandloy1 - brandloy6;
if first.panid=0 then do i=1 to 6;
lag_brandloy(i) = brandloy(i);
end;
do i=1 to 6;
** Compute only the brandloy values, not the lag_brandloy values;
end;
run;
You still need to fill in the details ... computing brandloy values, when to compute lag_brandloy values, etc. But this approach will get around the complexities of the LAG function.
You have shown output data that you don't want. But you haven't shown (1) sampel input data, (2) output data you DO want.
Make your objectives clear - show the input data, and desired output data.
Thanks for your reply!
The input data:
brand_lastindicator
0
1
0
0
1
1
...
My desired output data(which can be realized by code them one by one)
if No=3 and brandloy(i)= . then brandloy(i) = sum(0.8*lag_brandloy(i),0.2*brand_lastindicator); lag_brandloy(i) = lag(brandloy(i)); if No=3 and brandloy(i)= . then brandloy(i) = sum(0.8*lag_brandloy(i),0.2*brand_lastindicator); lag_brandloy(i) = lag(brandloy(i)); if No=4 and brandloy(i)= . then brandloy(i) = sum(0.8*lag_brandloy(i),0.2*brand_lastindicator); lag_brandloy(i) = lag(brandloy(i)); if No=5 and brandloy(i)= . then brandloy(i) = sum(0.8*lag_brandloy(i),0.2*brand_lastindicator);
I want to do the same thing from No=2 to 160.
I do not see the variables PANID, BRANDLOY1, ... BRANDLOY6 in your sample input data. Can you provide some actual data?
Here is a data sample
I don't have the brandloy1-brandloy6 at first, I need to compute them.
For each PANID, I have to compute the brandloy(i) for the first observation, then the second and so on.
As you have seen, the LAG function is more complex than it looks. For your application, I would recommend skipping it entirely. You can still get the results you want through this combination of steps:
(a) compute the lagged values before computing the unlagged values
(b) retain the unlagged values
For example:
data data.coffee;
set data.coffee;
by panid;
if first.PANID then NO=0;
NO + 1;
array brandloy (6) brandloy1 - brandloy6;
array lag_brandloy (6) lag_brandloy1-lag_brandloy_6;
So far, no changes. But now the idea is this:
retain brandloy1 - brandloy6;
if first.panid=0 then do i=1 to 6;
lag_brandloy(i) = brandloy(i);
end;
do i=1 to 6;
** Compute only the brandloy values, not the lag_brandloy values;
end;
run;
You still need to fill in the details ... computing brandloy values, when to compute lag_brandloy values, etc. But this approach will get around the complexities of the LAG function.
The way the LAG<n>() functions work you need to have a separate statement for each stack of lagged values you want to create.
data have ;
input x @@ ;
x_lag1 = lag1(x);
x_lag2 = lag2(x);
x_lag3 = lag3(x);
x_lag4 = lag4(x);
x_lag5 = lag5(x);
x_lag6 = lag6(x);
cards;
0 1 0 0 1 1 0 1 0 1
;
You could generate this repetitive code using a macro if you want.
%macro genlag(var,n);
%local i;
%do i=1 %to &n ;
&var._lag&i = lag&i(&var);
%end;
%mend ;
data want ;
set have ;
%genlag(x,6);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.