Solved: Condition of calculating Lag in a datastep

Phil_NZ · Posted 01-30-2021 05:21 AM

Hi SAS Users,

Regarding LAG function, I have two questions:

1. I saw an example online and see people set the condition

if first.id then lag_value = .;

as the picture below

I do not know what is the code mentioned above for and without this code, whether SAS will generate the missing for the first observation. I ask this question because the official guide from SAS did not mention setting this condition

Link: The one I retrieve the example above: https://www.listendata.com/2016/08/4-ways-to-calculate-lag-and-lead-in-sas.html#:~:text=In%20SAS%2C%....

The official guide for SAS: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lefunctionsref&docsetTarg...

2. Can we use the LAG function directly in a calculation without defining it previously. What I mean is, can I do something like the code below:

data want;
 set have;
   y= lag(x) + lag2(x) + z;
run;

Or I must do something as below:

data want;

  set have;

   lagx=lag(x);
   lag2x=lag2(x);
   y= lagx+lag2x+z;
run;

Thank you!

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

PeterClemmensen · Posted 01-30-2021 05:58 AM

Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?

Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.

Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂

You could do something like this

data have;
input id x z;
datalines;
1 25 1 
1 30 2 
1 35 3 
1 40 4 
2 25 5 
2 30 6 
2 37 7 
;

data want;
   set have;
   by id;
   if first.id then c = 0; c + 1;
   lagx  = ifn(c = 1      , ., lag(x));
   lag2x = ifn(c in (1, 2), ., lag2(x));
   *y = lagx + lag2x + z;
   y = sum(lagx, lag2x, z);
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 01-30-2021 05:30 AM

1) This is done to prevent the lag function from 'crossing' by-groups. This is only an issue if you deal with data with different by-groups.

2) Yes, no problem. However be aware that lag2(x) returns a missing value for the first two observations. Therefore, you will get a missing value for y in the first two obs because you use the + operator and not the Sum Function. See the small example below

data have;
input x z;
datalines;
1 2
3 4
5 6
7 8
9 0
;

data want;
   set have;
   y = lag(x) + lag2(x) + z;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Phil_NZ · Posted 01-30-2021 05:48 AM

Hi @PeterClemmensen !

Thank you for your clear answer!

In case I want to have a result as this picture

I must follow strictly the code they posted, isn't it? Because my data is also like that.

And I think it is what you mentioned "crossing" by-groups.

So, if I want to perform my calculation, I must do two different datasteps

data want;
 set have;
 by id;
 lagx = lag(x);
 lag2x=lag2(x);
 if first.id then lagx = . and lag2x=.;
 if second.id then lag2x=.;
run;

data want1;
 set want;
  y=lagx + lag2x +z;
run;

is it correct? or is there any way to shorten it?

Many thanks!

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

PeterClemmensen · Posted 01-30-2021 05:58 AM

Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?

Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.

Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂

You could do something like this

data have;
input id x z;
datalines;
1 25 1 
1 30 2 
1 35 3 
1 40 4 
2 25 5 
2 30 6 
2 37 7 
;

data want;
   set have;
   by id;
   if first.id then c = 0; c + 1;
   lagx  = ifn(c = 1      , ., lag(x));
   lag2x = ifn(c in (1, 2), ., lag2(x));
   *y = lagx + lag2x + z;
   y = sum(lagx, lag2x, z);
run;

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 01-30-2021 05:33 AM

Usually, when programmers have questions about the Lag Function, it boils down to a common misunderstanding about the Lag Function. The Lag Function is not a 'lookback' function. Rather it is a 'queue' function.

When you use the Lag2 Function, SAS sets up a queue with two elements. Let us think of it like this [ . | . ]. Both elements are missing at the start of execution. Each time it executes, SAS returns the right-most value from the queue. Furthermore, the present value of the value we want to lag, is inserted into the queue from the left. The documentation describes a queue from bottom to top. However, I find it easier to present here with left to right. The principle is the same.

I've written a small blog post about the Lag Function that should clear up most misunderstandings here 🙂

The DATA to DATA Step Macro
Blog: SASnrd

Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Registration is open

Call for Content EXTENDED

Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Re: Condition of calculating Lag in a datastep

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away