DATA Step, Macro, Functions and more

LAG function

Reply
Contributor
Posts: 20

LAG function

WHAT IS MEANT BY 

 

x>sum (lag(y),1) ?

 

 and also 

 

what does this mean ... if last.enrolid or dtstart_next>dtend+1 ?

 

Super User
Posts: 13,084

Re: LAG function

The LAG function is used to inspect values of a given variable in previous observations, or rows, of data.

So sum(lag(y),1) would likely be expecting to add 1 to the value of the variable Y on the previous row. However there are some serious issues that arise when the function LAG is used in conditionals.

Then the value of X is tested to see if it is larger than that computed sum.

We would need the complete code of a data step as well as some example data to go into more details.

 

When a data step uses BY processing such as

Data want;

   set have;

   by enrolid;

 

the data set Have is expected to be sorted by the variable Enrolid. This creates automatic variable that you can test to see if the record is the first or last in the group of values for Enrolid. The First. and Last.  (note the dots there) have values of true or false (or 1 and 0 if you prefer)

If last.enrolid would be true if the last of a group and false otherwise.

So IF is testing if the current record is the last in group of erolid values OR the variable dtstart_next is greater than 1 plus the value of dtend.

Contributor
Posts: 20

Re: LAG function

this is somewhat what the code looks like :
if first.enrolid or dtstart >sum (lag(dtend).,1) then dtstat_beg= dtstart
retain dtstart_beg ;

if last.enrolid or dtstart_next>dtend+1 ;
Super User
Posts: 13,084

Re: LAG function


@manya92 wrote:
this is somewhat what the code looks like :
if first.enrolid or dtstart >sum (lag(dtend).,1) then dtstat_beg= dtstart
retain dtstart_beg ;

if last.enrolid or dtstart_next>dtend+1 ;

A "somewhat" is insufficient when dealing with LAG values.

Though roughly 90% (swagged) of the time a LAG on any line with an IF is not what the user expects or intends. Please read the documentation of the function.

Also look at this brief example:

data have;
   input x value;
datalines;
3  1
2  3
3  5
2  1
;
run;

data example;
   set have;
   if x=3 then y= lag(value);
   if x=2 and value>lag(y) then z=value;
run;

Try to estimate the values of y and z for each record.

 

Then run the code.

Then try to figure out where the results come from.

The run this code and examine

data example2;
   set have;
   lv = lag(value);
   if x=3 then y= lv;
   if x=2 and value>lv then z=value;
run;

And as an added note: it almost always is incorrect to use First.variable with lag as the lagged value is from the previous variable level.

 

And LAST.Variable only a good idea when Last.variable is not equal to First.Variable (both are true when a record is both the first and last of group i.e.only one record with that value (or combination of variables )

Trusted Advisor
Posts: 1,294

Re: LAG function

Warning someone about using lag function in an IF statement makes sense when the lag function is in the THEN clause, because the lag would be conditionally executed.  And that in turn would reveal the difference between conditionally updating a queue (which is what the lag function actually does) and conditionally looking back a row that one trivially does in a spreadsheet. 

 

But in this case the lag function is not being conditionally executed, because it is in the if TEST.  I.e. it's always being executed (even when the lag is part of a compound expression), and will not lead to the sort of problem displayed in @ballardw's example. 

 

The origin of this question by @manya92 is in https://communihow to code for continous enrollment using LAG and RETAIN

 

 

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 110 views
  • 0 likes
  • 3 in conversation