Hi SAS Users,
Regarding LAG function, I have two questions:
1. I saw an example online and see people set the condition
if first.id then lag_value = .;
as the picture below
I do not know what is the code mentioned above for and without this code, whether SAS will generate the missing for the first observation. I ask this question because the official guide from SAS did not mention setting this condition
Link: The one I retrieve the example above: https://www.listendata.com/2016/08/4-ways-to-calculate-lag-and-lead-in-sas.html#:~:text=In%20SAS%2C%....
The official guide for SAS: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lefunctionsref&docsetTarg...
2. Can we use the LAG function directly in a calculation without defining it previously. What I mean is, can I do something like the code below:
data want;
set have;
y= lag(x) + lag2(x) + z;
run;
Or I must do something as below:
data want;
set have;
lagx=lag(x);
lag2x=lag2(x);
y= lagx+lag2x+z;
run;
Thank you!
Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?
Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.
Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂
You could do something like this
data have;
input id x z;
datalines;
1 25 1
1 30 2
1 35 3
1 40 4
2 25 5
2 30 6
2 37 7
;
data want;
set have;
by id;
if first.id then c = 0; c + 1;
lagx = ifn(c = 1 , ., lag(x));
lag2x = ifn(c in (1, 2), ., lag2(x));
*y = lagx + lag2x + z;
y = sum(lagx, lag2x, z);
run;
1) This is done to prevent the lag function from 'crossing' by-groups. This is only an issue if you deal with data with different by-groups.
2) Yes, no problem. However be aware that lag2(x) returns a missing value for the first two observations. Therefore, you will get a missing value for y in the first two obs because you use the + operator and not the Sum Function. See the small example below
data have;
input x z;
datalines;
1 2
3 4
5 6
7 8
9 0
;
data want;
set have;
y = lag(x) + lag2(x) + z;
run;
Hi @PeterClemmensen !
Thank you for your clear answer!
In case I want to have a result as this picture
I must follow strictly the code they posted, isn't it? Because my data is also like that.
And I think it is what you mentioned "crossing" by-groups.
So, if I want to perform my calculation, I must do two different datasteps
data want;
set have;
by id;
lagx = lag(x);
lag2x=lag2(x);
if first.id then lagx = . and lag2x=.;
if second.id then lag2x=.;
run;
data want1;
set want;
y=lagx + lag2x +z;
run;
is it correct? or is there any way to shorten it?
Many thanks!
Well actually, in this case you have to be extra careful not crossing by-groups because you have the lag1 and the lag2 function to deal with. So for the Lag1 Function (same as Lag), you do not want the lagged value in the first obs of each by group and for the Lag2 Function you do not want the lagged value in the first 2 observations of the by-group. Makes sense?
Also, you may not want to have missing values for y is just lag2x is missing. Therefore, you can use the Sum Function instead of +.
Question: Do you have more lags than this in your actual data? If so, there are better ways to do this 🙂
You could do something like this
data have;
input id x z;
datalines;
1 25 1
1 30 2
1 35 3
1 40 4
2 25 5
2 30 6
2 37 7
;
data want;
set have;
by id;
if first.id then c = 0; c + 1;
lagx = ifn(c = 1 , ., lag(x));
lag2x = ifn(c in (1, 2), ., lag2(x));
*y = lagx + lag2x + z;
y = sum(lagx, lag2x, z);
run;
Usually, when programmers have questions about the Lag Function, it boils down to a common misunderstanding about the Lag Function. The Lag Function is not a 'lookback' function. Rather it is a 'queue' function.
When you use the Lag2 Function, SAS sets up a queue with two elements. Let us think of it like this [ . | . ]. Both elements are missing at the start of execution. Each time it executes, SAS returns the right-most value from the queue. Furthermore, the present value of the value we want to lag, is inserted into the queue from the left. The documentation describes a queue from bottom to top. However, I find it easier to present here with left to right. The principle is the same.
I've written a small blog post about the Lag Function that should clear up most misunderstandings here 🙂
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.