DATA Step, Macro, Functions and more

Flag break in sequence of pairs

Accepted Solution Solved
Reply
Contributor
Posts: 41
Accepted Solution

Flag break in sequence of pairs

Hello,

 

I am trying to figure out a way to add numbers (PAIR_SEQ column) when a patient's sequence of paired data (IntervalType=REST/ACTIVE where PAIR =1, 2, 3 etc) has a break in consecutive days between StartDate. For example, my dataset looks like the below table where PAIR_SEQ increases by 1 when days (StartDate) are non-sequential between paired records. See LINES = 11, 13, 19, 25, 31, 47 as examples. Any help is greatly appreciated.

 

LINE ID DataStartDate IntervalType PAIR StartDate PAIR_SEQ
1 1 19-Feb-13 REST 1 20-Feb-13 1
2 1 19-Feb-13 ACTIVE 1 20-Feb-13 1
3 1 19-Feb-13 REST 2 21-Feb-13 1
4 1 19-Feb-13 ACTIVE 2 21-Feb-13 1
5 1 19-Feb-13 REST 3 22-Feb-13 1
6 1 19-Feb-13 ACTIVE 3 22-Feb-13 1
7 1 19-Feb-13 REST 4 23-Feb-13 1
8 1 19-Feb-13 ACTIVE 4 23-Feb-13 1
9 1 19-Feb-13 REST 5 24-Feb-13 1
10 1 19-Feb-13 ACTIVE 5 24-Feb-13 1
11 1 26-Feb-13 REST 1 27-Feb-13 2
12 1 26-Feb-13 ACTIVE 1 27-Feb-13 2
13 1 26-Feb-13 REST 1 2-Mar-13 3
14 1 26-Feb-13 ACTIVE 1 2-Mar-13 3
15 1 26-Feb-13 REST 2 3-Mar-13 3
16 1 26-Feb-13 ACTIVE 2 3-Mar-13 3
17 1 26-Feb-13 REST 3 4-Mar-13 3
18 1 26-Feb-13 ACTIVE 3 4-Mar-13 3
19 1 11-Mar-13 REST 1 12-Mar-13 4
20 1 11-Mar-13 ACTIVE 1 12-Mar-13 4
21 1 11-Mar-13 REST 2 13-Mar-13 4
22 1 11-Mar-13 ACTIVE 2 13-Mar-13 4
23 1 11-Mar-13 REST 3 14-Mar-13 4
24 1 11-Mar-13 ACTIVE 3 14-Mar-13 4
25 1 18-Mar-13 REST 1 19-Mar-13 5
26 1 18-Mar-13 ACTIVE 1 19-Mar-13 5
27 1 18-Mar-13 REST 2 20-Mar-13 5
28 1 18-Mar-13 ACTIVE 2 20-Mar-13 5
29 1 18-Mar-13 REST 3 21-Mar-13 5
30 1 18-Mar-13 ACTIVE 3 21-Mar-13 5
31 1 18-Mar-13 REST 1 24-Mar-13 6
32 1 18-Mar-13 ACTIVE 1 24-Mar-13 6
33 1 18-Mar-13 REST 2 25-Mar-13 6
34 1 18-Mar-13 ACTIVE 2 25-Mar-13 6
35 1 18-Mar-13 REST 3 26-Mar-13 6
36 1 18-Mar-13 ACTIVE 3 26-Mar-13 6
37 1 18-Mar-13 REST 4 27-Mar-13 6
38 1 18-Mar-13 ACTIVE 4 27-Mar-13 6
39 1 18-Mar-13 REST 5 28-Mar-13 6
40 1 18-Mar-13 ACTIVE 5 28-Mar-13 6
41 1 18-Mar-13 REST 6 29-Mar-13 6
42 1 18-Mar-13 ACTIVE 6 29-Mar-13 6
43 1 18-Mar-13 REST 7 30-Mar-13 6
44 1 18-Mar-13 ACTIVE 7 30-Mar-13 6
45 1 18-Mar-13 REST 8 31-Mar-13 6
46 1 18-Mar-13 ACTIVE 8 31-Mar-13 6
47 1 16-Apr-13 REST 1 17-Apr-13 7
48 1 16-Apr-13 ACTIVE 1 17-Apr-13 7
49 1 16-Apr-13 REST 2 18-Apr-13 7
50 1 16-Apr-13 ACTIVE 2 18-Apr-13 7
51 1 16-Apr-13 REST 3 19-Apr-13 7
52 1 16-Apr-13 ACTIVE 3 19-Apr-13 7
53 1 16-Apr-13 REST 4 20-Apr-13 7
54 1 16-Apr-13 ACTIVE 4 20-Apr-13 7
55 1 16-Apr-13 REST 5 21-Apr-13 7
55 1 16-Apr-13 ACTIVE 5 21-Apr-13 7

Accepted Solutions
Solution
‎07-18-2017 01:16 PM
Super User
Posts: 5,084

Re: Flag break in sequence of pairs

[ Edited ]

One way or another, that will take going through the data twice.  SAS doesn't have a way to tell ahead of time how many observations there will be in a grouping.  Here's one way:

 

data really_want;

merge want ( in=keep_these keep=id pair_seq within_seq where=(within_seq=6))

      want;

by id pair_seq;

if keep_these;

run;

 

It's important to mention the data sets in the right order in the MERGE statement.  And you would have to ignore the warnings about WITHIN_SEQ being part of both data sets and one value overwriting the other.

 

@WAL83 to address the final concern, taking just the first set of records that satisfy X=3 ...

 

data take_these;

set want;

by id;

where within_seq=6;

if first.id;

keep id pair_seq;

run;

 

data really_want;

merge want take_these (in=take);

by id pair_seq;

if take;

run;

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Flag break in sequence of pairs

Before starting are your "date" variables actually SAS date values, i.e. numeric with a SAS date format applied, or are they character?

If character we'll need to add a step to get the numeric versions for comparisons.

 

Since line 13 has the same date as line 12 is your rule description incomplete or should line 13 NOT have an increment?

 

I would also request clarification that 1) your data is organized by ID and 2) when the ID changes from 1 to 2, which is extremely likely to have break in sequence, that you want to restart the pair sequence value at 1.

Contributor
Posts: 41

Re: Flag break in sequence of pairs

Thanks for the quick reply.

 

Both Date variables (DataStartDate & StartDate) are both SAS DATE9. numeric.

 

@ line 13 DataStartDate does not change between records however StartDate does. Rules are based on StartDate not DataStartDate. My data is sorted by ID, StartDate, descending IntervalType. Sorry for not including that initially.

 

Yes, when ID changes from 1 to 2 PAIR_SEQ will start over at 1.

 

 

 

 

Contributor
Posts: 41

Re: Flag break in sequence of pairs

That seems to have done the trick. I added '...or gap < 0' since gap was sometimes negative although I don't think it matters based on the sort order. Thank you!

 

data want;

  set have;

  by ID START_DATE descending Interval_Type;

  gap = dif(START_DATE);

  if first.ID then pair_seq=1;

  else if gap > 1 or gap < 0 then pair_seq + 1;

  *drop gap;

run;

Super User
Posts: 5,084

Re: Flag break in sequence of pairs

[ Edited ]

Maybe I'm simplifying this too much, but I think this does the trick:

 

data want;

set have;

by id;

gap = dif(startdate);

if first.id then pair_seq=1;

else if gap > 1 then pair_seq + 1;

drop gap;

run;

Contributor
Posts: 41

Re: Flag break in sequence of pairs

 Thanks again for your help.

 

After reviewing my data more thoroughly I found some anomalies that were not accouted for. My goal is to actually only keep records that have the first set of X number of consecutive pairs. The code you provided worked perfect but when I tried to reduce the set to where pair_seq=1, for some IDs it returned records with 1 pair or 2 consecutive pairs when those IDs had 3 consecutive pairs where pair_seq=2 or more. I provided 3 example cases below. Any help is much apprieciated.

 

The sort order of the data is: order by ID, STARTDATE, IntervalType desc;

 

Example 1, for first X=3 consecutive pairs (pair=1,1,2,2,3,3), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3

 

Example 2, for first X=2 consecutive pairs (pair=1,1,2,2), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3

 

Example 3, for first X=4 consecutive pairs (pair=1,1,2,2,3,3,4,4), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3
Super User
Posts: 5,084

Re: Flag break in sequence of pairs

It gets a bit more complex ... you need to set up an additional counter.  This is untested but looks right:

 

data want;

  retain x 3;   /* or 4 or whatever you choose */

  set have;

  by ID START_DATE descending Interval_Type;

  gap = dif(START_DATE);

  if first.ID then do;

     pair_seq=1;

     within_seq=1;

  end;

  else if gap > 1 or gap < 0 then do;

      pair_seq + 1;

      within_seq=1;

  end;

  else within_seq + 1;

  if within_seq > 2 * x then delete;

  *drop gap;

run;

Contributor
Posts: 41

Re: Flag break in sequence of pairs

Thank you. Your code is counting within_seq correctly and

 

if within_seq > 2 * x then delete;

 

removes records where within_seq > 6 when X=3, but pairs with less than X=3 are still present. How can I delete these? For example, the first sequence of X=3 pairs is pair_seq=3 for ID=3. I want to delete those records in red.

 

x ID DataStartDate IntervalType pair StartDate gap pair_seq within_seq
3 3 12-Sep-14 REST 1 13-Sep-14 -5 1 1
3 3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1 2
3 3 12-Sep-14 REST 2 14-Sep-14 1 1 3
3 3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1 4
3 3 16-Oct-14 REST 1 17-Oct-14 33 2 1
3 3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2 2
3 3 16-Oct-14 REST 1 20-Oct-14 3 3 1
3 3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3 2
3 3 16-Oct-14 REST 2 21-Oct-14 1 3 3
3 3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3 4
3 3 16-Oct-14 REST 3 22-Oct-14 1 3 5
3 3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3

6

 

 

 

Solution
‎07-18-2017 01:16 PM
Super User
Posts: 5,084

Re: Flag break in sequence of pairs

[ Edited ]

One way or another, that will take going through the data twice.  SAS doesn't have a way to tell ahead of time how many observations there will be in a grouping.  Here's one way:

 

data really_want;

merge want ( in=keep_these keep=id pair_seq within_seq where=(within_seq=6))

      want;

by id pair_seq;

if keep_these;

run;

 

It's important to mention the data sets in the right order in the MERGE statement.  And you would have to ignore the warnings about WITHIN_SEQ being part of both data sets and one value overwriting the other.

 

@WAL83 to address the final concern, taking just the first set of records that satisfy X=3 ...

 

data take_these;

set want;

by id;

where within_seq=6;

if first.id;

keep id pair_seq;

run;

 

data really_want;

merge want take_these (in=take);

by id pair_seq;

if take;

run;

Contributor
Posts: 41

Re: Flag break in sequence of pairs

Thank you.

 

The data step removes records with within_seq < 6 when x=3 but I also need to only keep the first pair_seq where X=3. Some IDs have more than 1 set of within_seq=1,2,3,4,5,6 when X=3. If I filter the dataset on pair_seq=1 I will miss records where their first within_seq=1,2,3,4,5,6 is greater than pair_seq=1 (ID=3).

 

For example, ID=5 has 3 pair_seq=1,2,3 where within_seq=1,2,3,4,5,6. I only need to keep the first pair_seq (pair_seq=1). I think this is the last issue to account for to achieve my goal.

 

ID pair_seq within_seq x Data Start Date Interval Type pair Start Date gap
3 3 1 3 16-Oct-14 REST 1 20-Oct-14 3
3 3 2 3 16-Oct-14 ACTIVE 1 20-Oct-14 0
3 3 3 3 16-Oct-14 REST 2 21-Oct-14 1
3 3 4 3 16-Oct-14 ACTIVE 2 21-Oct-14 0
3 3 5 3 16-Oct-14 REST 3 22-Oct-14 1
3 3 6 3 16-Oct-14 ACTIVE 3 22-Oct-14 0
5 1 1 3 22-May-13 REST 1 23-May-13 -42
5 1 2 3 22-May-13 ACTIVE 1 23-May-13 0
5 1 3 3 22-May-13 REST 2 24-May-13 1
5 1 4 3 22-May-13 ACTIVE 2 24-May-13 0
5 1 5 3 22-May-13 REST 3 25-May-13 1
5 1 6 3 22-May-13 ACTIVE 3 25-May-13 0
5 2 1 3 15-Jul-13 REST 1 16-Jul-13 42
5 2 2 3 15-Jul-13 ACTIVE 1 16-Jul-13 0
5 2 3 3 15-Jul-13 REST 2 17-Jul-13 1
5 2 4 3 15-Jul-13 ACTIVE 2 17-Jul-13 0
5 2 5 3 15-Jul-13 REST 3 18-Jul-13 1
5 2 6 3 15-Jul-13 ACTIVE 3 18-Jul-13 0
5 3 1 3 15-Jul-13 REST 1 29-Jul-13 6
5 3 2 3 15-Jul-13 ACTIVE 1 29-Jul-13 0
5 3 3 3 15-Jul-13 REST 2 30-Jul-13 1
5 3 4 3 15-Jul-13 ACTIVE 2 30-Jul-13 0
5 3 5 3 15-Jul-13 REST 3 31-Jul-13 1
5 3 6 3 15-Jul-13 ACTIVE 3 31-Jul-13 0
Contributor
Posts: 41

Re: Flag break in sequence of pairs

That works perfect! Thank you!!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 179 views
  • 1 like
  • 3 in conversation