BookmarkSubscribeRSS Feed
djohn051
Fluorite | Level 6

I have first used the following syntax to produce a flag which finds the first instance, within the same ID, where a value for variable1 is found with a value of <-105:

 

IF (ID=LAG(ID) AND variable1<-105 AND LAG(variable1) GE -105) THEN FLAG=1;
RUN;

What I would then like to do is flag every subsequent case after the one I have just flagged, within the same ID. So far I can only flag the one that is directly beneath the initial one I flagged using this syntax:

IF (ID=LAG(ID) AND LAG(flag)=1) THEN FLAG=1;
RUN;

Is there a way to run this code so that it flags every subsequent case following that initial flag, within the same ID?

9 REPLIES 9
Reeza
Super User
You should never be using ID=lag(ID) in SAS except in rare circumstances. Instead, use BY group processing, much more effective. LAG() functions don't work in conditional statements so you need to pull it out of the conditional IF and calculate it regardless. You can set it to missing after, conditionally.
mkeintz
PROC Star

@Reeza :

 

Regarding your caveat "LAG() functions don't work in conditional statements so you need to pull it out of the conditional IF and calculate it regardless".

 

I know that this advice is meant to correct the naïve understanding that LAG is a lookback, (rather than a queue-update, which is what it really is).   But IMO "don't work" is a little overstated, for a couple of reasons:

 

  • if-clause vs then-clause:  First, while using lag functions in the then-clause of an IF ... THEN statement can be counterintuitive, using them in the if-clause (even a compound if-clause) does not present any unexpected behavior.  I have tested (and I recall a discussion by @ChrisHemedinger on the subject), and have always found that every component of a compound if-condition is evaluated, even if logically unnecessary (i.e. when A=1 then testing "IF A=1 or B=1" theoretically doesn't need to evaluate B=1).  So "if A  or lag(b)=x" always updates the lag(b) queue, regardless of whether A is true.

 

 

Also, I recommend embedding the lag function inside an IFN or IFC function.  This will unconditionally execute the lag function, but only conditionally return the lag value.  For instance if sashelp.stocks were sorted by stock/date,  (it's actually stock/descending date), then calculating a monthly return (current CLOSE vs lagged CLOSE) that avoids contaminating one stock with the lagged value of the prior stock could be done as follows in a single statement:

 

proc sort data=sashelp.stocks out=sorted_stocks;
  by stock date;
run;

data want;
  set sorted_stocks;
  by stock;
  monthly_return=ifn(first.stock,.,(close-lag(close))/lag(close));
run;

But really we should probably "blame" SAS, which affixed the name "lag" to this function.  I suspect that if it were named something like "UPDATE_FIFOQ(x)" there would be much less confusion about its behavior in a conditional statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

Forget about an iterative lag.  In fact, forget about lag entirely.  Just hold onto the flag value once you calculate it.  For example:

 

data want;
set have;
by id;
if first.id then flag=.;
if variable1 < -105 then flag=1;
retain flag;
run;

This only works if FLAG is a newly created variable, not part of the incoming data.

 

Unless there is a more complex definition of what "every subsequent case" means, this should do it.

djohn051
Fluorite | Level 6

It is a bit more complicated than that. The <-105 rule is only to identify the initial case. What I want after that is for every subsequent case (they are already sorted by date) to be flagged as 1, within that given ID, regardless of if the value is < or >= -105.

 

Does that make sense?

PGStats
Opal | Level 21

Then go for:

 

data want;
flag = 0;
do until(last.id);
    set have; by id notsorted;
    if variable1 < -105 then flag = 1;
    output;
    end;
run;
PG
djohn051
Fluorite | Level 6

This seems to have flagged every case as =1

PGStats
Opal | Level 21

Only if the first value of variable1 is <-105 or missing in every id group.

 

data have;
do id = 1, 2, 3;
    do variable1 = 1, 2, -200, 3;
        output;
        end;
    end;
run;

data want;
flag = 0;
do until(last.id);
    set have; by id notsorted;
    if variable1 < -105 then flag = 1;
    output;
    end;
run;

proc print data=want noobs; var id variable1 flag; run;
                              id    variable1    flag

                               1          1        0
                               1          2        0
                               1       -200        1
                               1          3        1
                               2          1        0
                               2          2        0
                               2       -200        1
                               2          3        1
                               3          1        0
                               3          2        0
                               3       -200        1
                               3          3        1
PG
Astounding
PROC Star

OK, adding an OUTPUT statement should fix that:

 

data want;
set have;
by id;
if first.id then flag=.;
output; if variable1 < -105 then flag=1; retain flag; run;
mkeintz
PROC Star

Or if you want to avoid an explicit output statement, the lag function is uniquely useful:

 

data want;
  set have;
  by id;
  retain flag;
  if lag(variable1)<=-105 then flag=1;
  if first.id then flag=.;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4048 views
  • 3 likes
  • 5 in conversation