- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have first used the following syntax to produce a flag which finds the first instance, within the same ID, where a value for variable1 is found with a value of <-105:
IF (ID=LAG(ID) AND variable1<-105 AND LAG(variable1) GE -105) THEN FLAG=1;
RUN;
What I would then like to do is flag every subsequent case after the one I have just flagged, within the same ID. So far I can only flag the one that is directly beneath the initial one I flagged using this syntax:
IF (ID=LAG(ID) AND LAG(flag)=1) THEN FLAG=1;
RUN;
Is there a way to run this code so that it flags every subsequent case following that initial flag, within the same ID?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza :
Regarding your caveat "LAG() functions don't work in conditional statements so you need to pull it out of the conditional IF and calculate it regardless".
I know that this advice is meant to correct the naïve understanding that LAG is a lookback, (rather than a queue-update, which is what it really is). But IMO "don't work" is a little overstated, for a couple of reasons:
- if-clause vs then-clause: First, while using lag functions in the then-clause of an IF ... THEN statement can be counterintuitive, using them in the if-clause (even a compound if-clause) does not present any unexpected behavior. I have tested (and I recall a discussion by @ChrisHemedinger on the subject), and have always found that every component of a compound if-condition is evaluated, even if logically unnecessary (i.e. when A=1 then testing "IF A=1 or B=1" theoretically doesn't need to evaluate B=1). So "if A or lag(b)=x" always updates the lag(b) queue, regardless of whether A is true.
- Sometimes you want conditional execution of lags: There are times when putting a lag in the then-clause does make sense. I have a paper Leads and Lags: Static and Dynamic Queues in the SAS DATA Step, 2nd ed. that shows some of them (in the "irregular interleaved series" section).
Also, I recommend embedding the lag function inside an IFN or IFC function. This will unconditionally execute the lag function, but only conditionally return the lag value. For instance if sashelp.stocks were sorted by stock/date, (it's actually stock/descending date), then calculating a monthly return (current CLOSE vs lagged CLOSE) that avoids contaminating one stock with the lagged value of the prior stock could be done as follows in a single statement:
proc sort data=sashelp.stocks out=sorted_stocks;
by stock date;
run;
data want;
set sorted_stocks;
by stock;
monthly_return=ifn(first.stock,.,(close-lag(close))/lag(close));
run;
But really we should probably "blame" SAS, which affixed the name "lag" to this function. I suspect that if it were named something like "UPDATE_FIFOQ(x)" there would be much less confusion about its behavior in a conditional statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Forget about an iterative lag. In fact, forget about lag entirely. Just hold onto the flag value once you calculate it. For example:
data want;
set have;
by id;
if first.id then flag=.;
if variable1 < -105 then flag=1;
retain flag;
run;
This only works if FLAG is a newly created variable, not part of the incoming data.
Unless there is a more complex definition of what "every subsequent case" means, this should do it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is a bit more complicated than that. The <-105 rule is only to identify the initial case. What I want after that is for every subsequent case (they are already sorted by date) to be flagged as 1, within that given ID, regardless of if the value is < or >= -105.
Does that make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then go for:
data want;
flag = 0;
do until(last.id);
set have; by id notsorted;
if variable1 < -105 then flag = 1;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This seems to have flagged every case as =1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Only if the first value of variable1 is <-105 or missing in every id group.
data have;
do id = 1, 2, 3;
do variable1 = 1, 2, -200, 3;
output;
end;
end;
run;
data want;
flag = 0;
do until(last.id);
set have; by id notsorted;
if variable1 < -105 then flag = 1;
output;
end;
run;
proc print data=want noobs; var id variable1 flag; run;
id variable1 flag 1 1 0 1 2 0 1 -200 1 1 3 1 2 1 0 2 2 0 2 -200 1 2 3 1 3 1 0 3 2 0 3 -200 1 3 3 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, adding an OUTPUT statement should fix that:
data want;
set have;
by id;
if first.id then flag=.;output;
if variable1 < -105 then flag=1;
retain flag;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or if you want to avoid an explicit output statement, the lag function is uniquely useful:
data want;
set have;
by id;
retain flag;
if lag(variable1)<=-105 then flag=1;
if first.id then flag=.;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------