I have a dataset like following where I need a flag when Category changed from "P" to any other category during 6 months from last occurrence of category "P" and that flag has to be populated where Category is "P" based on forward lookups -
ID | Date | Category | Flag(want) |
A | 31-Jan-21 | P | 1 |
A | 28-Feb-21 | P | 1 |
A | 30-Jun-21 | U | 0 |
A | 30-Jul-21 | U | 0 |
A | 30-Sep-21 | C | 0 |
B | 31-Jan-21 | P | 0 |
B | 28-Feb-21 | P | 0 |
B | 30-Sep-21 | U | 0 |
B | 31-Oct-21 | U | 0 |
B | 31-Dec-21 | C | 0 |
Any help is deeply appreciated.
Please present your data as a data step, e.g.:
data have;
input ID $1. +1 Date date9. +1 Category $1.;
format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;run;
One way to do it is to catch the changes and then use SQL to get the flag:
data changes;
set have;
by ID Category notsorted;
if first.category and not first.id;
run;
proc sql;
create table want as select *,
case when category='P' then
exists(select * from changes where ID=have.id
and date>have.date and date<=intnx('month',have.date,6,'S'))
else 0
end as Flag
from have;
quit;
Please present your data as a data step, e.g.:
data have;
input ID $1. +1 Date date9. +1 Category $1.;
format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;run;
One way to do it is to catch the changes and then use SQL to get the flag:
data changes;
set have;
by ID Category notsorted;
if first.category and not first.id;
run;
proc sql;
create table want as select *,
case when category='P' then
exists(select * from changes where ID=have.id
and date>have.date and date<=intnx('month',have.date,6,'S'))
else 0
end as Flag
from have;
quit;
Given the data are sorted by ID/DATE, you could make a dataset FLAG_DATES, based on the non-"P" observations, with an observation for the date a flag could become 1 (i.e. 6 months before the earliest non-"P" category, and another for when it must become zero (the latest date for a non-"P" that sufficiently overlaps with the other non-"P"). This could accommodate instances of multiple non-overlapping "_flag=1" windows for a single ID.
Then you could use a MERGE of that dataset with the original dataset HAVE, accompanied by a conditional SET statement to automatically retain the _FLAG value, as in:
data have ;
input ID $1. +1 Date date9. +1 Category $1.;
format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
run;
data flag_dates (keep=id date _flag ) / view=flag_dates;
set have (where=(category^='P')) ;
by id;
_flag=1;
date=intnx('month',date,-6);
if first.id=1 or intck('month',lag(date),date)>6 then output;
merge have (where=(category^='P')) /*Re-read DATE and get NXT_DATE */
have (firstobs=2 where=(category^='P') keep=category date rename=(date=nxt_date)) ;
_flag=0;
if last.id=1 or intck('month',date,nxt_date)>6 then output;
run;
data want (drop=_:);
merge have (in=keep) flag_dates (in=inflg drop=_flag);
by id date;
if inflg=1 then set flag_dates (keep=_flag);
if keep;
if category='P' then flag=coalesce(_flag,0);
else flag=0;
run;
You might have to change the calculation of the DATE values in FLAG_DATES. For instance, the
date=intnx('month',date,-6);
statement generates a 6-month prior at the beginning for 6-months prior (i.e. 1st of the month). You might want the "same day of month" calculation (so modify the INTNX function). And you'll have to decide whether that date is within the window, or just outside of it. I.e. you might need to add 1 to the resulting date value. Same concept will need to be applied when considering whether the interval from the current DATE to the NXT_DATE is small enough to extend the end-of-window to next date.
/*
Assuming I understood what you mean.
And your dataset has been sorted by ID and DATE.
*/
data have ;
input ID $1. +1 Date date9. +1 Category $1.;
format date date9.;
cards;
A 31-Jan-21 P
A 28-Feb-21 P
A 30-Jun-21 U
A 30-Jul-21 U
A 30-Sep-21 C
B 31-Jan-21 P
B 28-Feb-21 P
B 30-Sep-21 U
B 31-Oct-21 U
B 31-Dec-21 C
;
data temp;
set have;
by id Category notsorted;
group+first.Category;
run;
data key;
merge temp temp(keep=id date Category rename=(id=_id date=_date Category=_Category ) firstobs=2);
if id=_id and Category='P' and _Category ne 'P' and .<intck('month',date,_date,'c')<=6;
keep group;
run;
data want;
if _n_=1 then do;
if 0 then set key;
declare hash h(dataset:'key');
h.definekey('group');
h.definedone();
end;
set temp;
flag= ( h.check()=0 );
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.