Hello,
I am trying to understand how the LAGn function works. I have the following test code (it doesn't really make sense, just did it to understand the processing):
data demo; input ID $ day; cards; A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 A 9 ; data test; set demo; three_days_ago=lag3(day); if mod(day, 2)=0 then three_days_ago_even=lag3(day); run;
According to my current understanding LAGn creates a FIFO queue of size n and each time it's called it returns the first value in the queue (the oldest added) and adds the value of the lagged variable at the current observation to the end of the queue. So I supposed I would get something like this:
ID day three_days_ago three_days_ago_even (initial queue: {., ., .})
A 1 . . (queue returns ., queue after step: {., ., 1}
A 2 . . (queue returns . and ., queue after step: {1, 2, 2}
A 3 1 . (queue returns 1, queue after step: {2, 2, 3}
A 4 2 2 (queue returns 2 and 2, queue after step: {3, 4, 4}
A 5 3 . (queue returns 3, queue after step: {4, 4, 5}
A 6 4 4 (queue returns 4 and 4, queue after step: {5, 6, 6}
A 7 5 . (queue returns 5, queue after step: {6, 6, 7}
A 8 6 6 (queue returns 6 and 6, queue after step: {7, 8, 8}
A 9 7 . (queue returns 7, queue after step: {8, 8, 9}
However, I got this:
ID day three_days_ago three_days_ago_even
A 1 . .
A 2 . .
A 3 . .
A 4 1 .
A 5 2 .
A 6 3 .
A 7 4 .
A 8 5 2
A 9 6 .
Could you please point out where I'm mistaken?
Thank you in advance!
@Cynthia_sas wrote:
Hi:
Have you discovered this Tech Support note https://support.sas.com/kb/24/665.html ? It explains why you should not use lag conditionally and has a sample program that shows the wrong and right way to invoke the LAG function.
...
There are real use cases for using lag conditionally, primarily because it allows you to manage multiple queues. Consider a case where you have a number of stock trade daily closing prices, sorted chronologically, but with some dates missing randomly for each stock. Then to get the return (close/lag(close)-1) for a given stock, you would benefit from LAG in the THEN clause of an IF statement, as in:
data have;
set sashelp.stocks;
if ranuni(0598150)<0.2 then delete;
format date date9.;
run;
proc sort;
by date;
run;
data want;
set have;
if stock='IBM' then return=close/lag(close)-1; else
if stock='Intel' then return=close/lag(close)-1; else
if stock='Microsoft' then return=close/lag(close)-1;
run;
Ordinarily if you have a structure like:
if stock='IBM' then return=function(argument); else
if stock='Intel' then return=function(argument); else
if stock='Microsoft' then return=function(argument);
good programming practice would favor
if stock in ('IBM','Intel','Microsoft') then return=function(argument);
But because the lag function manages a queue, this would fail, because it has only one queue where three queues are needed.
Looks right to me.
The even observations are for day 2,4,6,8, etc.
SInce you are using LAG3() there should be missing values for the first 3 of those.
To get what you desire you need to use LAG1() , also known as LAG(), instead. Although your variable name seems wrong. Try this statement instead:
if mod(day, 2)=0 then lagged_even_days_only=lag(day);
What you really want is to always update the queue, but conditionally return the lag3 value. So change
if mod(day, 2)=0 then three_days_ago_even=lag3(day);
to
three_days_ago_even=ifn(mod(day, 2)=0,lag3(day),.);
Unlike the IF statement, the IFN function always calculates both possible outcomes, and then choose one based on the IF condition in the first parameter. This means the LAG3 queue is always updated, but not always returned.
For more on this, take a look at Leads and Lags: Static and Dynamic Queues in the SAS® DATA STEP
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.