I have first used the following syntax to produce a flag which finds the first instance, within the same ID, where a value for variable1 is found with a value of <-105:
IF (ID=LAG(ID) AND variable1<-105 AND LAG(variable1) GE -105) THEN FLAG=1;
RUN;
What I would then like to do is flag every subsequent case after the one I have just flagged, within the same ID. So far I can only flag the one that is directly beneath the initial one I flagged using this syntax:
IF (ID=LAG(ID) AND LAG(flag)=1) THEN FLAG=1;
RUN;
Is there a way to run this code so that it flags every subsequent case following that initial flag, within the same ID?
@Reeza :
Regarding your caveat "LAG() functions don't work in conditional statements so you need to pull it out of the conditional IF and calculate it regardless".
I know that this advice is meant to correct the naïve understanding that LAG is a lookback, (rather than a queue-update, which is what it really is). But IMO "don't work" is a little overstated, for a couple of reasons:
Also, I recommend embedding the lag function inside an IFN or IFC function. This will unconditionally execute the lag function, but only conditionally return the lag value. For instance if sashelp.stocks were sorted by stock/date, (it's actually stock/descending date), then calculating a monthly return (current CLOSE vs lagged CLOSE) that avoids contaminating one stock with the lagged value of the prior stock could be done as follows in a single statement:
proc sort data=sashelp.stocks out=sorted_stocks;
by stock date;
run;
data want;
set sorted_stocks;
by stock;
monthly_return=ifn(first.stock,.,(close-lag(close))/lag(close));
run;
But really we should probably "blame" SAS, which affixed the name "lag" to this function. I suspect that if it were named something like "UPDATE_FIFOQ(x)" there would be much less confusion about its behavior in a conditional statement.
Forget about an iterative lag. In fact, forget about lag entirely. Just hold onto the flag value once you calculate it. For example:
data want;
set have;
by id;
if first.id then flag=.;
if variable1 < -105 then flag=1;
retain flag;
run;
This only works if FLAG is a newly created variable, not part of the incoming data.
Unless there is a more complex definition of what "every subsequent case" means, this should do it.
It is a bit more complicated than that. The <-105 rule is only to identify the initial case. What I want after that is for every subsequent case (they are already sorted by date) to be flagged as 1, within that given ID, regardless of if the value is < or >= -105.
Does that make sense?
Then go for:
data want;
flag = 0;
do until(last.id);
set have; by id notsorted;
if variable1 < -105 then flag = 1;
output;
end;
run;
This seems to have flagged every case as =1
Only if the first value of variable1 is <-105 or missing in every id group.
data have;
do id = 1, 2, 3;
do variable1 = 1, 2, -200, 3;
output;
end;
end;
run;
data want;
flag = 0;
do until(last.id);
set have; by id notsorted;
if variable1 < -105 then flag = 1;
output;
end;
run;
proc print data=want noobs; var id variable1 flag; run;
id variable1 flag 1 1 0 1 2 0 1 -200 1 1 3 1 2 1 0 2 2 0 2 -200 1 2 3 1 3 1 0 3 2 0 3 -200 1 3 3 1
OK, adding an OUTPUT statement should fix that:
data want;
set have;
by id;
if first.id then flag=.;
output;
if variable1 < -105 then flag=1;
retain flag;
run;
Or if you want to avoid an explicit output statement, the lag function is uniquely useful:
data want;
set have;
by id;
retain flag;
if lag(variable1)<=-105 then flag=1;
if first.id then flag=.;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.