BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KPCklebspn
Obsidian | Level 7

I want to  flag  the  first value for var >7 within  a by  group (ID). What  is the  simplest way to do this?  Thank  you!

 

id     var

a       1

a       7

a       10

b        8

b        9

 


Want

id     var      flag

a       1

a       7

a       10       1

b        8         1

b        9

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT; 
  set HAVE;
  by ID;
  if first.ID then FOUND=0;
  if ^FOUND & VAR > 7 then do;
    FOUND+1;
    FLAG=1;
  end;
run;

 

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT; 
  set HAVE;
  by ID;
  if first.ID then FOUND=0;
  if ^FOUND & VAR > 7 then do;
    FOUND+1;
    FLAG=1;
  end;
run;

 

KPCklebspn
Obsidian | Level 7

Thank you,  works! would you mind explaining  what the found variable is? especially the ^found part?

novinosrin
Tourmaline | Level 20


data have;
input id $    var ;
cards;
a       1

a       7

a       10

b        8

b        9
;

data  want;
 do until(last.id);
  set have;
  by id;
  call missing(flag);
  if _n_ and var>7 then do;
   flag=1;
   _n_=0;
  end;
  output;
 end;
run;
mkeintz
PROC Star

The "trick" here is make sure that you don't inadvertently flag a VAR>7 if it is the second (or third, etc.) such record for a given id.  

 

I don't know if I would exactly recommend this technique, but it is an excellent task to demonstrate the queue-management techniques of the LAG function, as in:

 

data have;
  input id $1.    var;
datalines;
a       1
a       7
a       10
b        8
b        9
run;

data want;
  set have;
  if var>7 then do;
    if id^=lag(id) then flag=1;
  end;
run;

The LAG function only updates its underlying queue (in this case a queue of length 1) when the record in hand has VAR>7.   So the only time ID comparisons are made are for those records.

 

Note you don't need a BY statement or resetting of any variables in this code, although the data must be physically sorted/grouped by ID.

 

Now if you were to want, say, the first THREE records with var>7 for each ID  (where each ID data can have any number of such ID's), then you can replace LAG(ID) with LAG3(ID), as in:

 

data have;
  input id $1.    var;
datalines;
a       1
a       7
a       10
a        12
a        13
a       16
b        8
b        9
b        10
b        12
b        13
run;

data want;
  set have;
  if var>7 then do;
    if id^=lag3(id) then flag=1;
  end;
run;

The LAG3 function maintains a queue (a FIFO queue) with length 3, i.e. the three most recent id's for records with VAR>7.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

@ChrisNZ 

 

I think you may have looked only at my second example.  The first example does produce only one flag per id.  The reason for the second example was just to show how easy it would be to generate flags for the first X qualifying records for each ID.

 

And just to gild the lily.  What if the OP wanted to flag, say the 2nd through 3rd  qualifying records.  It would be still be relatively straightforward using the conditional lag functions. It would be:

    the first 3, i.e. where lag3(id)^=id   

    starting with the 2nd, where  lag(id)=id

resulting in:

 

data want;
  set have;
  if var>7 then do;
    if id^=lag3(id) and lag(id)=id then flag=1;
  end;
run;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
tocilj
Calcite | Level 5
data have;
input id $    var ;
cards;
a       1
a       7
a       10
b        8
b        9
;

proc sort data=have out=haves;
by id descending var;
run;

data want;
set haves;
by id descending var;
if (first.id and first.var and var > 7) then flag=1;
else flag=.;
run;
mkeintz
PROC Star

@tocilj 

 

Your code will work, but you don't need the "and first.var" in the subsetting IF condition.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2536 views
  • 1 like
  • 5 in conversation