BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Users,

 

Regarding LAG function, I have two questions:

1. I saw an example online and see people set the condition

if first.id then lag_value = .;

as the picture below

My97_0-1612001716136.png

I do not know what is the code mentioned above for and without this code, whether SAS will generate the missing for the first observation. I ask this question because the official guide from SAS did not mention setting this condition

Link: The one I retrieve the example above: https://www.listendata.com/2016/08/4-ways-to-calculate-lag-and-lead-in-sas.html#:~:text=In%20SAS%2C%....

The official guide for SAS: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lefunctionsref&docsetTarg...

 

2. Can we use the LAG function directly in a calculation without defining it previously. What I mean is, can I do something like the code below:

data want;
 set have;
   y= lag(x) + lag2(x) + z;
run;

Or I must do something as below:

data want;

  set have;

   lagx=lag(x);
   lag2x=lag2(x);
   y= lagx+lag2x+z;
run;

Thank you!

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?

 

Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.

 

Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂

 

You could do something like this

 

data have;
input id x z;
datalines;
1 25 1 
1 30 2 
1 35 3 
1 40 4 
2 25 5 
2 30 6 
2 37 7 
;

data want;
   set have;
   by id;
   if first.id then c = 0; c + 1;
   lagx  = ifn(c = 1      , ., lag(x));
   lag2x = ifn(c in (1, 2), ., lag2(x));
   *y = lagx + lag2x + z;
   y = sum(lagx, lag2x, z);
run;

 

 

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

1) This is done to prevent the lag function from 'crossing' by-groups. This is only an issue if you deal with data with different by-groups.

 

2) Yes, no problem. However be aware that lag2(x) returns a missing value for the first two observations. Therefore, you will get a missing value for y in the first two obs because you use the + operator and not the Sum Function. See the small example below

 

data have;
input x z;
datalines;
1 2
3 4
5 6
7 8
9 0
;

data want;
   set have;
   y = lag(x) + lag2(x) + z;
run;
Phil_NZ
Barite | Level 11

Hi @PeterClemmensen !

 

Thank you for your clear answer!

In case I want to have a result as this picture 

My97_0-1612003348787.png

I must follow strictly the code they posted, isn't it? Because my data is also like that.

And I think it is what you mentioned "crossing" by-groups.

So, if I want to perform my calculation, I must do two different datasteps

data want;
 set have;
 by id;
 lagx = lag(x);
lag2x=lag2(x); if first.id then lagx = . and lag2x=.;
if second.id then lag2x=.; run; data want1; set want; y=lagx + lag2x +z; run;

is it correct? or is there any way to shorten it?

 

Many thanks!

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
PeterClemmensen
Tourmaline | Level 20

Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?

 

Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.

 

Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂

 

You could do something like this

 

data have;
input id x z;
datalines;
1 25 1 
1 30 2 
1 35 3 
1 40 4 
2 25 5 
2 30 6 
2 37 7 
;

data want;
   set have;
   by id;
   if first.id then c = 0; c + 1;
   lagx  = ifn(c = 1      , ., lag(x));
   lag2x = ifn(c in (1, 2), ., lag2(x));
   *y = lagx + lag2x + z;
   y = sum(lagx, lag2x, z);
run;

 

 

PeterClemmensen
Tourmaline | Level 20

Usually, when programmers have questions about the Lag Function, it boils down to a common misunderstanding about the Lag Function. The Lag Function is not a 'lookback' function. Rather it is a 'queue' function.

 

When you use the Lag2 Function, SAS sets up a queue with two elements. Let us think of it like this [ . | . ]. Both elements are missing at the start of execution. Each time it executes, SAS returns the right-most value from the queue. Furthermore, the present value of the value we want to lag, is inserted into the queue from the left. The documentation describes a queue from bottom to top. However, I find it easier to present here with left to right. The principle is the same.

 

I've written a small blog post about the Lag Function that should clear up most misunderstandings here 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1233 views
  • 3 likes
  • 2 in conversation