Hi,
I am looking for an efficient way to create two flags based on ID level patterns of observation types and I am struggling to find pertinent documentation. With this data the focus tends be on the most recent observation for an ID, but in some cases I need to look back when certain criteria are met.
The first flag I would like to create is when there are two observations with type=99, but only when there are no observations with type 10 or 4 in between them.
The second flag I would like to create is when value= 2, but only in cases where the following two observations have value =1 and then value= 0 respectively (no missing values may be in between). Ideally in this instance I would also pull the date from the most recent observation with value=0 to the observation with the value=2.
With respect to flag1 - in the past I kept only type=99 observations and used lag to pull in values from the prior observation. However that approach will no longer work because if there are one or more observations in between that do have type 10 or 4 then that ID is disqualified. Flag2 is a new need so I haven't coded it yet. My thought was to use a double lag, but in the rare instance that there are multiple zeros, ideally I would pull in the date from the most recent 0.
Any suggestions are much appreciated!
Here is simplified input data for have and want datasets-
data have;
input ID $ date date9. type$ type2;
format date date9.;
datalines;
A 06JUN2025 11 2
A 13JUN2025 1 1
A 13JUN2025 10 2
A 15JUN2025 1 1
A 23JUN2025 99 0
A 27JUN2025 99 0
B 13JUN2025 1 .
B 14JUN2025 99 .
B 16JUN2025 11 2
B 23JUN2025 1 1
B 26JUN2025 99 0
C 15JUN2025 99 .
C 23JUN2025 10 .
C 27JUN2025 99 .
D 01JUN2025 10 2
D 5JUN2025 99 1
D 20JUN2025 3 0
D 29JUN2025 99 0
;
run;
data want;
input ID $ date date9. type$ flag1 type2 flag2 date2 date9.;
format date date2 date9.;
datalines;
A 06JUN2025 11 . 2 .
A 13JUN2025 1 . 1 .
A 13JUN2025 10 . 2 1 27JUN2025
A 15JUN2025 1 . 1 .
A 23JUN2025 99 1 0 .
A 27JUN2025 99 1 0 .
B 13JUN2025 1 . . .
B 14JUN2025 99 0 . .
B 16JUN2025 11 . 2 1 26JUN2025
B 23JUN2025 1 . 1 .
B 26JUN2025 99 0 0 .
C 15JUN2025 99 0 . .
C 23JUN2025 10 . . .
C 27JUN2025 99 0 . .
D 01JUN2025 10 . 2 1 29JUN2025
D 5JUN2025 99 1 1 .
D 20JUN2025 3 . 0. .
D 29JUN2025 99 1 0 .
;
run;
If I understand you correctly, and if my conjecture about ID=B is correct, then you can use the following code:
data want (drop=_: nxt:);
do _n=1 by 1 until (last.id);
set have;
by id;
array _chk{30};
if type='99' then do;
if _last99^=. then do;
_chk{_last99}=1;
_chk{_n}=1;
end;
_last99=_n;
end;
if type in ('10','4') then _last99=.;
if type2=0 then _last_t2_eq_0_date=date;
end;
/* With the _chk array and _last_t2_eq_0_date in hand, reread this ID and set flags*/
do _p=1 to _n;
merge have
have (firstobs=2 keep=id type2 rename=(id=nxt1_id type2=nxt1_t2))
have (firstobs=3 keep=id type2 rename=(id=nxt2_id type2=nxt2_t2));
if type='99' then flag1=sum(0,_chk{_p});
else flag1=.;
if id=nxt2_id and type2=2 and nxt1_t2=1 and nxt2_t2=0 then flag2=1;
else flag2=.;
if flag2=1 then date2=_last_t2_eq_0_date;
else date2=.;
output;
end;
format date2 date9. ;
run;
It reads each ID twice. the first time it checks for type='99' that are not separated by '4' or '10', and set dummies in the _CHK array accordingly. It also keeps track of the last date for type2=0.
In the second pass of data, the flags are set, using the array for flag1. For flag2 the second pass uses the firstobs= options in a MERGE statement to look ahead for a pattern of type2=2 for current, followed by a type2=1 and type2=0. Note the MERGE must NOT use a BY ID statement. If it did, then the alignment set up by the FIRSTOBS parameters would be lost at the beginning of the second ID.
If I understand you correctly, and if my conjecture about ID=B is correct, then you can use the following code:
data want (drop=_: nxt:);
do _n=1 by 1 until (last.id);
set have;
by id;
array _chk{30};
if type='99' then do;
if _last99^=. then do;
_chk{_last99}=1;
_chk{_n}=1;
end;
_last99=_n;
end;
if type in ('10','4') then _last99=.;
if type2=0 then _last_t2_eq_0_date=date;
end;
/* With the _chk array and _last_t2_eq_0_date in hand, reread this ID and set flags*/
do _p=1 to _n;
merge have
have (firstobs=2 keep=id type2 rename=(id=nxt1_id type2=nxt1_t2))
have (firstobs=3 keep=id type2 rename=(id=nxt2_id type2=nxt2_t2));
if type='99' then flag1=sum(0,_chk{_p});
else flag1=.;
if id=nxt2_id and type2=2 and nxt1_t2=1 and nxt2_t2=0 then flag2=1;
else flag2=.;
if flag2=1 then date2=_last_t2_eq_0_date;
else date2=.;
output;
end;
format date2 date9. ;
run;
It reads each ID twice. the first time it checks for type='99' that are not separated by '4' or '10', and set dummies in the _CHK array accordingly. It also keeps track of the last date for type2=0.
In the second pass of data, the flags are set, using the array for flag1. For flag2 the second pass uses the firstobs= options in a MERGE statement to look ahead for a pattern of type2=2 for current, followed by a type2=1 and type2=0. Note the MERGE must NOT use a BY ID statement. If it did, then the alignment set up by the FIRSTOBS parameters would be lost at the beginning of the second ID.
Thank you, I appreciate your response! I thought I responded on Saturday, but I can't seem to find my post for some reason.
The code you provided works with a slight modification - which is my fault. I can see that for flag1 I specified that type 10 or 4 cannot be in between, type 11 should have been in that list. Once I add the 11 the code works correctly.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.