Solved: Re: Identify patterns across observations

Zen_aly · Posted 07-18-2025 03:56 PM

Hi,

I am looking for an efficient way to create two flags based on ID level patterns of observation types and I am struggling to find pertinent documentation. With this data the focus tends be on the most recent observation for an ID, but in some cases I need to look back when certain criteria are met.

The first flag I would like to create is when there are two observations with type=99, but only when there are no observations with type 10 or 4 in between them.
The second flag I would like to create is when value= 2, but only in cases where the following two observations have value =1 and then value= 0 respectively (no missing values may be in between). Ideally in this instance I would also pull the date from the most recent observation with value=0 to the observation with the value=2.

With respect to flag1 - in the past I kept only type=99 observations and used lag to pull in values from the prior observation. However that approach will no longer work because if there are one or more observations in between that do have type 10 or 4 then that ID is disqualified. Flag2 is a new need so I haven't coded it yet. My thought was to use a double lag, but in the rare instance that there are multiple zeros, ideally I would pull in the date from the most recent 0.

Any suggestions are much appreciated!

Here is simplified input data for have and want datasets-
data have;
input ID $ date date9. type$ type2;
format date date9.;
datalines;
A 06JUN2025 11 2
A 13JUN2025 1 1
A 13JUN2025 10 2
A 15JUN2025 1 1
A 23JUN2025 99 0
A 27JUN2025 99 0
B 13JUN2025 1 .
B 14JUN2025 99 .
B 16JUN2025 11 2
B 23JUN2025 1 1
B 26JUN2025 99 0
C 15JUN2025 99 .
C 23JUN2025 10 .
C 27JUN2025 99 .
D 01JUN2025 10 2
D 5JUN2025 99 1
D 20JUN2025 3 0
D 29JUN2025 99 0
;
run;

data want;
input ID $ date date9. type$ flag1 type2 flag2 date2 date9.;
format date date2 date9.;
datalines;
A 06JUN2025 11 . 2 .
A 13JUN2025 1 . 1 .
A 13JUN2025 10 . 2 1 27JUN2025
A 15JUN2025 1 . 1 .
A 23JUN2025 99 1 0 .
A 27JUN2025 99 1 0 .
B 13JUN2025 1 . . .
B 14JUN2025 99 0 . .
B 16JUN2025 11 . 2 1 26JUN2025
B 23JUN2025 1 . 1 .
B 26JUN2025 99 0 0 .
C 15JUN2025 99 0 . .
C 23JUN2025 10 . . .
C 27JUN2025 99 0 . .
D 01JUN2025 10 . 2 1 29JUN2025
D 5JUN2025 99 1 1 .
D 20JUN2025 3 . 0. .
D 29JUN2025 99 1 0 .
;
run;

mkeintz · Posted 07-18-2025 11:52 PM

First, I assume that flag1 is based on the type variable, and flag2 is based on the type2 variable, correct?
Next: you have two observations for ID B with type='99'. They are separated by two observations with type='11' and type='1'.

So why have you assigned flag1=0 to those 99's, given they are NOT separated by a 10 or a 4, which I understand is the only reason to set flag1 to 0?

If I understand you correctly, and if my conjecture about ID=B is correct, then you can use the following code:

data want (drop=_: nxt:);
  do _n=1 by 1 until (last.id);
    set   have;
    by id;
    array _chk{30};
    if type='99' then do;
      if _last99^=. then do;
        _chk{_last99}=1; 
        _chk{_n}=1;
      end;
      _last99=_n;
    end;
    if type in ('10','4') then _last99=.;
    if type2=0 then _last_t2_eq_0_date=date;
  end;

  /* With the _chk array and _last_t2_eq_0_date in hand, reread this ID and set flags*/
  do _p=1 to _n;
    merge have
          have (firstobs=2 keep=id type2 rename=(id=nxt1_id type2=nxt1_t2))
          have (firstobs=3 keep=id type2 rename=(id=nxt2_id type2=nxt2_t2));
    if type='99' then flag1=sum(0,_chk{_p});
    else flag1=.;

    if id=nxt2_id and type2=2 and nxt1_t2=1 and nxt2_t2=0 then flag2=1;
    else flag2=.;
    if flag2=1 then date2=_last_t2_eq_0_date;
    else date2=.;
    output;
  end;
  format date2 date9. ;
run;

It reads each ID twice. the first time it checks for type='99' that are not separated by '4' or '10', and set dummies in the _CHK array accordingly. It also keeps track of the last date for type2=0.

In the second pass of data, the flags are set, using the array for flag1. For flag2 the second pass uses the firstobs= options in a MERGE statement to look ahead for a pattern of type2=2 for current, followed by a type2=1 and type2=0. Note the MERGE must NOT use a BY ID statement. If it did, then the alignment set up by the FIRSTOBS parameters would be lost at the beginning of the second ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 07-18-2025 11:52 PM

First, I assume that flag1 is based on the type variable, and flag2 is based on the type2 variable, correct?
Next: you have two observations for ID B with type='99'. They are separated by two observations with type='11' and type='1'.

So why have you assigned flag1=0 to those 99's, given they are NOT separated by a 10 or a 4, which I understand is the only reason to set flag1 to 0?

If I understand you correctly, and if my conjecture about ID=B is correct, then you can use the following code:

data want (drop=_: nxt:);
  do _n=1 by 1 until (last.id);
    set   have;
    by id;
    array _chk{30};
    if type='99' then do;
      if _last99^=. then do;
        _chk{_last99}=1; 
        _chk{_n}=1;
      end;
      _last99=_n;
    end;
    if type in ('10','4') then _last99=.;
    if type2=0 then _last_t2_eq_0_date=date;
  end;

  /* With the _chk array and _last_t2_eq_0_date in hand, reread this ID and set flags*/
  do _p=1 to _n;
    merge have
          have (firstobs=2 keep=id type2 rename=(id=nxt1_id type2=nxt1_t2))
          have (firstobs=3 keep=id type2 rename=(id=nxt2_id type2=nxt2_t2));
    if type='99' then flag1=sum(0,_chk{_p});
    else flag1=.;

    if id=nxt2_id and type2=2 and nxt1_t2=1 and nxt2_t2=0 then flag2=1;
    else flag2=.;
    if flag2=1 then date2=_last_t2_eq_0_date;
    else date2=.;
    output;
  end;
  format date2 date9. ;
run;

It reads each ID twice. the first time it checks for type='99' that are not separated by '4' or '10', and set dummies in the _CHK array accordingly. It also keeps track of the last date for type2=0.

In the second pass of data, the flags are set, using the array for flag1. For flag2 the second pass uses the firstobs= options in a MERGE statement to look ahead for a pattern of type2=2 for current, followed by a type2=1 and type2=0. Note the MERGE must NOT use a BY ID statement. If it did, then the alignment set up by the FIRSTOBS parameters would be lost at the beginning of the second ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Zen_aly · Posted 07-22-2025 10:38 AM

Thank you, I appreciate your response! I thought I responded on Saturday, but I can't seem to find my post for some reason.

The code you provided works with a slight modification - which is my fault. I can see that for flag1 I specified that type 10 or 4 cannot be in between, type 11 should have been in that list. Once I add the 11 the code works correctly.

Identify patterns across observations

Re: Identify patterns across observations

Re: Identify patterns across observations

Re: Identify patterns across observations

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away