I want to flag the first value for var >7 within a by group (ID). What is the simplest way to do this? Thank you!
id var
a 1
a 7
a 10
b 8
b 9
Want
id var flag
a 1
a 7
a 10 1
b 8 1
b 9
Like this?
data WANT;
set HAVE;
by ID;
if first.ID then FOUND=0;
if ^FOUND & VAR > 7 then do;
FOUND+1;
FLAG=1;
end;
run;
Like this?
data WANT;
set HAVE;
by ID;
if first.ID then FOUND=0;
if ^FOUND & VAR > 7 then do;
FOUND+1;
FLAG=1;
end;
run;
Thank you, works! would you mind explaining what the found variable is? especially the ^found part?
data have;
input id $ var ;
cards;
a 1
a 7
a 10
b 8
b 9
;
data want;
do until(last.id);
set have;
by id;
call missing(flag);
if _n_ and var>7 then do;
flag=1;
_n_=0;
end;
output;
end;
run;
The "trick" here is make sure that you don't inadvertently flag a VAR>7 if it is the second (or third, etc.) such record for a given id.
I don't know if I would exactly recommend this technique, but it is an excellent task to demonstrate the queue-management techniques of the LAG function, as in:
data have;
input id $1. var;
datalines;
a 1
a 7
a 10
b 8
b 9
run;
data want;
set have;
if var>7 then do;
if id^=lag(id) then flag=1;
end;
run;
The LAG function only updates its underlying queue (in this case a queue of length 1) when the record in hand has VAR>7. So the only time ID comparisons are made are for those records.
Note you don't need a BY statement or resetting of any variables in this code, although the data must be physically sorted/grouped by ID.
Now if you were to want, say, the first THREE records with var>7 for each ID (where each ID data can have any number of such ID's), then you can replace LAG(ID) with LAG3(ID), as in:
data have;
input id $1. var;
datalines;
a 1
a 7
a 10
a 12
a 13
a 16
b 8
b 9
b 10
b 12
b 13
run;
data want;
set have;
if var>7 then do;
if id^=lag3(id) then flag=1;
end;
run;
The LAG3 function maintains a queue (a FIFO queue) with length 3, i.e. the three most recent id's for records with VAR>7.
@mkeintz Only one flag per ID is wanted as I understand.
I think you may have looked only at my second example. The first example does produce only one flag per id. The reason for the second example was just to show how easy it would be to generate flags for the first X qualifying records for each ID.
And just to gild the lily. What if the OP wanted to flag, say the 2nd through 3rd qualifying records. It would be still be relatively straightforward using the conditional lag functions. It would be:
the first 3, i.e. where lag3(id)^=id
starting with the 2nd, where lag(id)=id
resulting in:
data want;
set have;
if var>7 then do;
if id^=lag3(id) and lag(id)=id then flag=1;
end;
run;
data have;
input id $ var ;
cards;
a 1
a 7
a 10
b 8
b 9
;
proc sort data=have out=haves;
by id descending var;
run;
data want;
set haves;
by id descending var;
if (first.id and first.var and var > 7) then flag=1;
else flag=.;
run;
Your code will work, but you don't need the "and first.var" in the subsetting IF condition.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.