Solved: Flagging first variable above a certain value within a by group

KPCklebspn · Posted 07-01-2020 07:04 PM

I want to flag the first value for var >7 within a by group (ID). What is the simplest way to do this? Thank you!

id var

a 1

a 7

a 10

b 8

b 9

Want

id var flag

a 1

a 7

a 10 1

b 8 1

b 9

ChrisNZ · Posted 07-01-2020 07:45 PM

Like this?

data WANT; 
  set HAVE;
  by ID;
  if first.ID then FOUND=0;
  if ^FOUND & VAR > 7 then do;
    FOUND+1;
    FLAG=1;
  end;
run;

High-Performance SAS Coding - Third Edition

View solution in original post

ChrisNZ · Posted 07-01-2020 07:45 PM

Like this?

data WANT; 
  set HAVE;
  by ID;
  if first.ID then FOUND=0;
  if ^FOUND & VAR > 7 then do;
    FOUND+1;
    FLAG=1;
  end;
run;

High-Performance SAS Coding - Third Edition

KPCklebspn · Posted 07-01-2020 09:50 PM

Thank you, works! would you mind explaining what the found variable is? especially the ^found part?

novinosrin · Posted 07-01-2020 07:48 PM



data have;
input id $    var ;
cards;
a       1

a       7

a       10

b        8

b        9
;

data  want;
 do until(last.id);
  set have;
  by id;
  call missing(flag);
  if _n_ and var>7 then do;
   flag=1;
   _n_=0;
  end;
  output;
 end;
run;

mkeintz · Posted 07-01-2020 08:50 PM

The "trick" here is make sure that you don't inadvertently flag a VAR>7 if it is the second (or third, etc.) such record for a given id.

I don't know if I would exactly recommend this technique, but it is an excellent task to demonstrate the queue-management techniques of the LAG function, as in:

data have;
  input id $1.    var;
datalines;
a       1
a       7
a       10
b        8
b        9
run;

data want;
  set have;
  if var>7 then do;
    if id^=lag(id) then flag=1;
  end;
run;

The LAG function only updates its underlying queue (in this case a queue of length 1) when the record in hand has VAR>7. So the only time ID comparisons are made are for those records.

Note you don't need a BY statement or resetting of any variables in this code, although the data must be physically sorted/grouped by ID.

Now if you were to want, say, the first THREE records with var>7 for each ID (where each ID data can have any number of such ID's), then you can replace LAG(ID) with LAG3(ID), as in:

data have;
  input id $1.    var;
datalines;
a       1
a       7
a       10
a        12
a        13
a       16
b        8
b        9
b        10
b        12
b        13
run;

data want;
  set have;
  if var>7 then do;
    if id^=lag3(id) then flag=1;
  end;
run;

The LAG3 function maintains a queue (a FIFO queue) with length 3, i.e. the three most recent id's for records with VAR>7.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 07-01-2020 09:43 PM

@mkeintz Only one flag per ID is wanted as I understand.

High-Performance SAS Coding - Third Edition

mkeintz · Posted 07-02-2020 10:39 AM

@ChrisNZ

I think you may have looked only at my second example. The first example does produce only one flag per id. The reason for the second example was just to show how easy it would be to generate flags for the first X qualifying records for each ID.

And just to gild the lily. What if the OP wanted to flag, say the 2nd through 3rd qualifying records. It would be still be relatively straightforward using the conditional lag functions. It would be:

the first 3, i.e. where lag3(id)^=id

starting with the 2nd, where lag(id)=id

resulting in:

data want;
  set have;
  if var>7 then do;
    if id^=lag3(id) and lag(id)=id then flag=1;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

tocilj · Posted 07-04-2020 11:49 PM

data have;
input id $    var ;
cards;
a       1
a       7
a       10
b        8
b        9
;

proc sort data=have out=haves;
by id descending var;
run;

data want;
set haves;
by id descending var;
if (first.id and first.var and var > 7) then flag=1;
else flag=.;
run;

mkeintz · Posted 07-06-2020 12:24 PM

@tocilj

Your code will work, but you don't need the "and first.var" in the subsetting IF condition.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Re: Flagging first variable above a certain value within a by group

Registration is open

SAS Training: Just a Click Away