BookmarkSubscribeRSS Feed
Magdalena318
Calcite | Level 5

Hello, 

 

I am trying to understand how the LAGn function works. I have the following test code (it doesn't really make sense, just did it to understand the processing):

 

 

data demo;
                input ID $ day;
                cards;
                A 1
                A 2
                A 3
                A 4
                A 5
                A 6
                A 7
                A 8
                A 9
;

data test;
                set demo;
                three_days_ago=lag3(day);
                if mod(day, 2)=0 then three_days_ago_even=lag3(day);
run;

 

According to my current understanding LAGn creates a FIFO queue of size n and each time it's called it returns the first value in the queue (the oldest added) and adds the value of the lagged variable at the current observation to the end of the queue. So I supposed I would get something like this:

 

 

ID            day       three_days_ago three_days_ago_even      (initial queue: {., ., .})

A            1              .                             .                                              (queue returns ., queue after step: {., ., 1}

A            2             .                             .                                              (queue returns . and ., queue after step: {1, 2, 2}

A            3             1                             .                                              (queue returns 1, queue after step: {2, 2, 3}

A            4             2                           2                                              (queue returns 2 and 2, queue after step: {3, 4, 4}

A            5             3                           .                                               (queue returns 3, queue after step: {4, 4, 5}

A            6             4                          4                                               (queue returns 4 and 4, queue after step: {5, 6, 6}

A            7             5                           .                                               (queue returns 5, queue after step: {6, 6, 7}

A            8             6                          6                                               (queue returns 6 and 6, queue after step: {7, 8, 8}

A            9             7                          .                                                (queue returns 7, queue after step: {8, 8, 9}

 

However, I got this:

ID            day       three_days_ago three_days_ago_even

A            1              .                             .

A            2             .                             .

A            3             .                             .

A            4             1                            .

A            5             2                           .

A            6             3                           .

A            7             4                           .

A            8             5                          2

A            9             6                          .

 

Could you please point out where I'm mistaken?

Thank you in advance!

4 REPLIES 4
Cynthia_sas
SAS Super FREQ
Hi:
Have you discovered this Tech Support note https://support.sas.com/kb/24/665.html ? It explains why you should not use lag conditionally and has a sample program that shows the wrong and right way to invoke the LAG function.
As part of the explanation, the note says: "The value of the LAG function is returned to a queue. Each time the LAG function is executed, the value at the top of its queue is removed and returned, and all remaining values are shifted upward. The new value returned by the function is placed at the bottom of the queue. This process occurs only when the function is executed. A conditionally executed LAG function only stores and return a value from the observation that meets the condition. That means the LAG function returns the value from the last observation that met the conditional logic, which might not be the previous observation."
One of the best practices I was taught was that LAG should not be invoked within an IF statement, but should always be issued unconditionally and then manipulated as shown in the Tech Support note.
Cynthia
mkeintz
PROC Star

@Cynthia_sas wrote:
Hi:
Have you discovered this Tech Support note https://support.sas.com/kb/24/665.html ? It explains why you should not use lag conditionally and has a sample program that shows the wrong and right way to invoke the LAG function.

...

There are real use cases for using lag conditionally, primarily because it allows you to manage multiple queues.  Consider a case where you have a number of stock trade daily closing prices, sorted chronologically, but with some dates missing randomly for each stock.  Then to get the return (close/lag(close)-1) for a given stock, you would benefit from LAG in the THEN clause of an IF statement, as in:

 

data have;
  set sashelp.stocks;
  if ranuni(0598150)<0.2 then delete;
  format date date9.;
run;
proc sort;
  by date;
run;

data want;
  set have;
  if stock='IBM' then return=close/lag(close)-1;       else
  if stock='Intel' then return=close/lag(close)-1;     else
  if stock='Microsoft' then return=close/lag(close)-1;
run;

Ordinarily if you have a structure like:

 

  if stock='IBM'       then return=function(argument);  else
  if stock='Intel'     then return=function(argument);  else
  if stock='Microsoft' then return=function(argument);

good programming practice would favor

  if stock in ('IBM','Intel','Microsoft') then return=function(argument);

But because the lag function manages a queue, this would fail, because it has only one queue where three queues are needed.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Looks right to me.

The even observations are for day 2,4,6,8, etc.

SInce you are using LAG3() there should be missing values for the first 3 of those.

 

To get what you desire you need to use LAG1() , also known as LAG(), instead.  Although your variable name seems wrong.  Try this statement instead:

if mod(day, 2)=0 then lagged_even_days_only=lag(day);

 

mkeintz
PROC Star

What you really want is to always update the queue, but conditionally return the lag3 value.  So change

 

                if mod(day, 2)=0 then three_days_ago_even=lag3(day);

to

  three_days_ago_even=ifn(mod(day, 2)=0,lag3(day),.);

Unlike the IF statement, the IFN function always calculates both possible outcomes, and then choose one based on the IF condition in the first parameter.  This means the LAG3 queue is always updated, but not always returned.

 

For more on this, take a look at Leads and Lags: Static and Dynamic Queues in the SAS® DATA STEP 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 699 views
  • 1 like
  • 4 in conversation