Solved: Forward Lookups and Conditional Values

thepushkarsingh · Posted 12-16-2023 06:42 AM

I have a dataset like following where I need a flag when Category changed from "P" to any other category during 6 months from last occurrence of category "P" and that flag has to be populated where Category is "P" based on forward lookups -

ID	Date	Category	Flag(want)
A	31-Jan-21	P	1
A	28-Feb-21	P	1
A	30-Jun-21	U	0
A	30-Jul-21	U	0
A	30-Sep-21	C	0
B	31-Jan-21	P	0
B	28-Feb-21	P	0
B	30-Sep-21	U	0
B	31-Oct-21	U	0
B	31-Dec-21	C	0

Any help is deeply appreciated.

s_lassen · Posted 12-16-2023 08:12 AM

Please present your data as a data step, e.g.:

data have;
  input ID $1. +1 Date date9. +1 Category $1.;
  format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;run;

One way to do it is to catch the changes and then use SQL to get the flag:

data changes;
  set have;
  by ID Category notsorted;
  if first.category and not first.id;
run;

proc sql;
  create table want as select *,
    case when category='P' then
    exists(select * from changes where ID=have.id 
                                 and date>have.date and date<=intnx('month',have.date,6,'S')) 
    else 0
    end as Flag
  from have;
quit;

View solution in original post

s_lassen · Posted 12-16-2023 08:12 AM

Please present your data as a data step, e.g.:

data have;
  input ID $1. +1 Date date9. +1 Category $1.;
  format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;run;

One way to do it is to catch the changes and then use SQL to get the flag:

data changes;
  set have;
  by ID Category notsorted;
  if first.category and not first.id;
run;

proc sql;
  create table want as select *,
    case when category='P' then
    exists(select * from changes where ID=have.id 
                                 and date>have.date and date<=intnx('month',have.date,6,'S')) 
    else 0
    end as Flag
  from have;
quit;

thepushkarsingh · Posted 12-18-2023 03:15 AM

Many thanks, though it takes quite a long to complete on my dataset, almost 4 hrs!

mkeintz · Posted 12-17-2023 05:13 PM

Given the data are sorted by ID/DATE, you could make a dataset FLAG_DATES, based on the non-"P" observations, with an observation for the date a flag could become 1 (i.e. 6 months before the earliest non-"P" category, and another for when it must become zero (the latest date for a non-"P" that sufficiently overlaps with the other non-"P"). This could accommodate instances of multiple non-overlapping "_flag=1" windows for a single ID.

Then you could use a MERGE of that dataset with the original dataset HAVE, accompanied by a conditional SET statement to automatically retain the _FLAG value, as in:

data have ;
  input ID $1. +1 Date date9. +1 Category $1.;
  format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
run;

data flag_dates (keep=id date _flag ) / view=flag_dates;
  set have (where=(category^='P')) ;
  by id;

  _flag=1;
  date=intnx('month',date,-6);
  if first.id=1 or intck('month',lag(date),date)>6 then output;

  merge have (where=(category^='P'))                /*Re-read DATE and get NXT_DATE */
        have (firstobs=2 where=(category^='P') keep=category date rename=(date=nxt_date)) ;

  _flag=0;
  if last.id=1  or intck('month',date,nxt_date)>6 then output;
run;


data want (drop=_:);
  merge have (in=keep) flag_dates (in=inflg drop=_flag);
  by id date;

  if inflg=1 then set flag_dates (keep=_flag);
  if keep;
  if category='P' then flag=coalesce(_flag,0);
  else flag=0;
run;

You might have to change the calculation of the DATE values in FLAG_DATES. For instance, the

  date=intnx('month',date,-6);

statement generates a 6-month prior at the beginning for 6-months prior (i.e. 1st of the month). You might want the "same day of month" calculation (so modify the INTNX function). And you'll have to decide whether that date is within the window, or just outside of it. I.e. you might need to add 1 to the resulting date value. Same concept will need to be applied when considering whether the interval from the current DATE to the NXT_DATE is small enough to extend the end-of-window to next date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 12-17-2023 09:35 PM

/*
Assuming I understood what you mean.
And your dataset has been sorted by ID and DATE.
*/

data have ;
  input ID $1. +1 Date date9. +1 Category $1.;
  format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;


data temp;
 set have;
 by id Category notsorted;
 group+first.Category;
run;
data key;
 merge temp temp(keep=id date Category rename=(id=_id date=_date Category=_Category ) firstobs=2);
 if id=_id and Category='P' and _Category ne 'P' and .<intck('month',date,_date,'c')<=6;
 keep group;
run;
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key');
  h.definekey('group');
  h.definedone();
 end;
set temp;
flag= ( h.check()=0 );
run;

thepushkarsingh · Posted 12-18-2023 03:16 AM

Thank you so much, this is most efficient on my large dataset.

Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Re: Forward Lookups and Conditional Values

Classroom Training Available!