BookmarkSubscribeRSS Feed
manya92
Fluorite | Level 6

WHAT IS MEANT BY 

 

x>sum (lag(y),1) ?

 

 and also 

 

what does this mean ... if last.enrolid or dtstart_next>dtend+1 ?

 

4 REPLIES 4
ballardw
Super User

The LAG function is used to inspect values of a given variable in previous observations, or rows, of data.

So sum(lag(y),1) would likely be expecting to add 1 to the value of the variable Y on the previous row. However there are some serious issues that arise when the function LAG is used in conditionals.

Then the value of X is tested to see if it is larger than that computed sum.

We would need the complete code of a data step as well as some example data to go into more details.

 

When a data step uses BY processing such as

Data want;

   set have;

   by enrolid;

 

the data set Have is expected to be sorted by the variable Enrolid. This creates automatic variable that you can test to see if the record is the first or last in the group of values for Enrolid. The First. and Last.  (note the dots there) have values of true or false (or 1 and 0 if you prefer)

If last.enrolid would be true if the last of a group and false otherwise.

So IF is testing if the current record is the last in group of erolid values OR the variable dtstart_next is greater than 1 plus the value of dtend.

manya92
Fluorite | Level 6
this is somewhat what the code looks like :
if first.enrolid or dtstart >sum (lag(dtend).,1) then dtstat_beg= dtstart
retain dtstart_beg ;

if last.enrolid or dtstart_next>dtend+1 ;
ballardw
Super User

@manya92 wrote:
this is somewhat what the code looks like :
if first.enrolid or dtstart >sum (lag(dtend).,1) then dtstat_beg= dtstart
retain dtstart_beg ;

if last.enrolid or dtstart_next>dtend+1 ;

A "somewhat" is insufficient when dealing with LAG values.

Though roughly 90% (swagged) of the time a LAG on any line with an IF is not what the user expects or intends. Please read the documentation of the function.

Also look at this brief example:

data have;
   input x value;
datalines;
3  1
2  3
3  5
2  1
;
run;

data example;
   set have;
   if x=3 then y= lag(value);
   if x=2 and value>lag(y) then z=value;
run;

Try to estimate the values of y and z for each record.

 

Then run the code.

Then try to figure out where the results come from.

The run this code and examine

data example2;
   set have;
   lv = lag(value);
   if x=3 then y= lv;
   if x=2 and value>lv then z=value;
run;

And as an added note: it almost always is incorrect to use First.variable with lag as the lagged value is from the previous variable level.

 

And LAST.Variable only a good idea when Last.variable is not equal to First.Variable (both are true when a record is both the first and last of group i.e.only one record with that value (or combination of variables )

mkeintz
PROC Star

Warning someone about using lag function in an IF statement makes sense when the lag function is in the THEN clause, because the lag would be conditionally executed.  And that in turn would reveal the difference between conditionally updating a queue (which is what the lag function actually does) and conditionally looking back a row that one trivially does in a spreadsheet. 

 

But in this case the lag function is not being conditionally executed, because it is in the if TEST.  I.e. it's always being executed (even when the lag is part of a compound expression), and will not lead to the sort of problem displayed in @ballardw's example. 

 

The origin of this question by @manya92 is in https://communihow to code for continous enrollment using LAG and RETAIN

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 801 views
  • 0 likes
  • 3 in conversation