BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WAL83
Obsidian | Level 7

Hello,

 

I am trying to figure out a way to add numbers (PAIR_SEQ column) when a patient's sequence of paired data (IntervalType=REST/ACTIVE where PAIR =1, 2, 3 etc) has a break in consecutive days between StartDate. For example, my dataset looks like the below table where PAIR_SEQ increases by 1 when days (StartDate) are non-sequential between paired records. See LINES = 11, 13, 19, 25, 31, 47 as examples. Any help is greatly appreciated.

 

LINE ID DataStartDate IntervalType PAIR StartDate PAIR_SEQ
1 1 19-Feb-13 REST 1 20-Feb-13 1
2 1 19-Feb-13 ACTIVE 1 20-Feb-13 1
3 1 19-Feb-13 REST 2 21-Feb-13 1
4 1 19-Feb-13 ACTIVE 2 21-Feb-13 1
5 1 19-Feb-13 REST 3 22-Feb-13 1
6 1 19-Feb-13 ACTIVE 3 22-Feb-13 1
7 1 19-Feb-13 REST 4 23-Feb-13 1
8 1 19-Feb-13 ACTIVE 4 23-Feb-13 1
9 1 19-Feb-13 REST 5 24-Feb-13 1
10 1 19-Feb-13 ACTIVE 5 24-Feb-13 1
11 1 26-Feb-13 REST 1 27-Feb-13 2
12 1 26-Feb-13 ACTIVE 1 27-Feb-13 2
13 1 26-Feb-13 REST 1 2-Mar-13 3
14 1 26-Feb-13 ACTIVE 1 2-Mar-13 3
15 1 26-Feb-13 REST 2 3-Mar-13 3
16 1 26-Feb-13 ACTIVE 2 3-Mar-13 3
17 1 26-Feb-13 REST 3 4-Mar-13 3
18 1 26-Feb-13 ACTIVE 3 4-Mar-13 3
19 1 11-Mar-13 REST 1 12-Mar-13 4
20 1 11-Mar-13 ACTIVE 1 12-Mar-13 4
21 1 11-Mar-13 REST 2 13-Mar-13 4
22 1 11-Mar-13 ACTIVE 2 13-Mar-13 4
23 1 11-Mar-13 REST 3 14-Mar-13 4
24 1 11-Mar-13 ACTIVE 3 14-Mar-13 4
25 1 18-Mar-13 REST 1 19-Mar-13 5
26 1 18-Mar-13 ACTIVE 1 19-Mar-13 5
27 1 18-Mar-13 REST 2 20-Mar-13 5
28 1 18-Mar-13 ACTIVE 2 20-Mar-13 5
29 1 18-Mar-13 REST 3 21-Mar-13 5
30 1 18-Mar-13 ACTIVE 3 21-Mar-13 5
31 1 18-Mar-13 REST 1 24-Mar-13 6
32 1 18-Mar-13 ACTIVE 1 24-Mar-13 6
33 1 18-Mar-13 REST 2 25-Mar-13 6
34 1 18-Mar-13 ACTIVE 2 25-Mar-13 6
35 1 18-Mar-13 REST 3 26-Mar-13 6
36 1 18-Mar-13 ACTIVE 3 26-Mar-13 6
37 1 18-Mar-13 REST 4 27-Mar-13 6
38 1 18-Mar-13 ACTIVE 4 27-Mar-13 6
39 1 18-Mar-13 REST 5 28-Mar-13 6
40 1 18-Mar-13 ACTIVE 5 28-Mar-13 6
41 1 18-Mar-13 REST 6 29-Mar-13 6
42 1 18-Mar-13 ACTIVE 6 29-Mar-13 6
43 1 18-Mar-13 REST 7 30-Mar-13 6
44 1 18-Mar-13 ACTIVE 7 30-Mar-13 6
45 1 18-Mar-13 REST 8 31-Mar-13 6
46 1 18-Mar-13 ACTIVE 8 31-Mar-13 6
47 1 16-Apr-13 REST 1 17-Apr-13 7
48 1 16-Apr-13 ACTIVE 1 17-Apr-13 7
49 1 16-Apr-13 REST 2 18-Apr-13 7
50 1 16-Apr-13 ACTIVE 2 18-Apr-13 7
51 1 16-Apr-13 REST 3 19-Apr-13 7
52 1 16-Apr-13 ACTIVE 3 19-Apr-13 7
53 1 16-Apr-13 REST 4 20-Apr-13 7
54 1 16-Apr-13 ACTIVE 4 20-Apr-13 7
55 1 16-Apr-13 REST 5 21-Apr-13 7
55 1 16-Apr-13 ACTIVE 5 21-Apr-13 7
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

One way or another, that will take going through the data twice.  SAS doesn't have a way to tell ahead of time how many observations there will be in a grouping.  Here's one way:

 

data really_want;

merge want ( in=keep_these keep=id pair_seq within_seq where=(within_seq=6))

      want;

by id pair_seq;

if keep_these;

run;

 

It's important to mention the data sets in the right order in the MERGE statement.  And you would have to ignore the warnings about WITHIN_SEQ being part of both data sets and one value overwriting the other.

 

@WAL83 to address the final concern, taking just the first set of records that satisfy X=3 ...

 

data take_these;

set want;

by id;

where within_seq=6;

if first.id;

keep id pair_seq;

run;

 

data really_want;

merge want take_these (in=take);

by id pair_seq;

if take;

run;

View solution in original post

10 REPLIES 10
ballardw
Super User

Before starting are your "date" variables actually SAS date values, i.e. numeric with a SAS date format applied, or are they character?

If character we'll need to add a step to get the numeric versions for comparisons.

 

Since line 13 has the same date as line 12 is your rule description incomplete or should line 13 NOT have an increment?

 

I would also request clarification that 1) your data is organized by ID and 2) when the ID changes from 1 to 2, which is extremely likely to have break in sequence, that you want to restart the pair sequence value at 1.

WAL83
Obsidian | Level 7

Thanks for the quick reply.

 

Both Date variables (DataStartDate & StartDate) are both SAS DATE9. numeric.

 

@ line 13 DataStartDate does not change between records however StartDate does. Rules are based on StartDate not DataStartDate. My data is sorted by ID, StartDate, descending IntervalType. Sorry for not including that initially.

 

Yes, when ID changes from 1 to 2 PAIR_SEQ will start over at 1.

 

 

 

 

WAL83
Obsidian | Level 7

That seems to have done the trick. I added '...or gap < 0' since gap was sometimes negative although I don't think it matters based on the sort order. Thank you!

 

data want;

  set have;

  by ID START_DATE descending Interval_Type;

  gap = dif(START_DATE);

  if first.ID then pair_seq=1;

  else if gap > 1 or gap < 0 then pair_seq + 1;

  *drop gap;

run;

Astounding
PROC Star

Maybe I'm simplifying this too much, but I think this does the trick:

 

data want;

set have;

by id;

gap = dif(startdate);

if first.id then pair_seq=1;

else if gap > 1 then pair_seq + 1;

drop gap;

run;

WAL83
Obsidian | Level 7

 Thanks again for your help.

 

After reviewing my data more thoroughly I found some anomalies that were not accouted for. My goal is to actually only keep records that have the first set of X number of consecutive pairs. The code you provided worked perfect but when I tried to reduce the set to where pair_seq=1, for some IDs it returned records with 1 pair or 2 consecutive pairs when those IDs had 3 consecutive pairs where pair_seq=2 or more. I provided 3 example cases below. Any help is much apprieciated.

 

The sort order of the data is: order by ID, STARTDATE, IntervalType desc;

 

Example 1, for first X=3 consecutive pairs (pair=1,1,2,2,3,3), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3

 

Example 2, for first X=2 consecutive pairs (pair=1,1,2,2), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3

 

Example 3, for first X=4 consecutive pairs (pair=1,1,2,2,3,3,4,4), I would only keep records in GREEN where ID = 1, 2 & 3 & drop records in RED

 

ID DataStartDate IntervalType pair StartDate gap pair_seq
1 2-Apr-14 REST 1 3-Apr-14 38 1
1 2-Apr-14 ACTIVE 1 3-Apr-14 0 1
1 2-Apr-14 REST 2 4-Apr-14 1 1
1 2-Apr-14 ACTIVE 2 4-Apr-14 0 1
1 2-Apr-14 REST 3 5-Apr-14 1 1
1 2-Apr-14 ACTIVE 3 5-Apr-14 0 1
1 2-Apr-14 REST 4 6-Apr-14 1 1
1 2-Apr-14 ACTIVE 4 6-Apr-14 0 1
1 2-Apr-14 REST 5 7-Apr-14 1 1
1 2-Apr-14 ACTIVE 5 7-Apr-14 0 1
2 26-Mar-13 REST 1 27-Mar-13 -14 1
2 26-Mar-13 ACTIVE 1 27-Mar-13 0 1
2 1-Apr-13 REST 1 2-Apr-13 6 2
2 1-Apr-13 ACTIVE 1 2-Apr-13 0 2
2 1-Apr-13 REST 2 3-Apr-13 1 2
2 1-Apr-13 ACTIVE 2 3-Apr-13 0 2
2 1-Apr-13 REST 3 4-Apr-13 1 2
2 1-Apr-13 ACTIVE 3 4-Apr-13 0 2
2 1-Apr-13 REST 4 5-Apr-13 1 2
2 1-Apr-13 ACTIVE 4 5-Apr-13 0 2
2 1-Apr-13 REST 5 6-Apr-13 1 2
2 1-Apr-13 ACTIVE 5 6-Apr-13 0 2
2 1-Apr-13 REST 6 7-Apr-13 1 2
2 1-Apr-13 ACTIVE 6 7-Apr-13 0 2
2 1-Apr-13 REST 7 8-Apr-13 1 2
2 1-Apr-13 ACTIVE 7 8-Apr-13 0 2
3 12-Sep-14 REST 1 13-Sep-14 -5 1
3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1
3 12-Sep-14 REST 2 14-Sep-14 1 1
3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1
3 16-Oct-14 REST 1 17-Oct-14 33 2
3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2
3 16-Oct-14 REST 1 20-Oct-14 3 3
3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3
3 16-Oct-14 REST 2 21-Oct-14 1 3
3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3
3 16-Oct-14 REST 3 22-Oct-14 1 3
3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3
3 16-Oct-14 REST 4 23-Oct-14 1 3
3 16-Oct-14 ACTIVE 4 23-Oct-14 0 3
Astounding
PROC Star

It gets a bit more complex ... you need to set up an additional counter.  This is untested but looks right:

 

data want;

  retain x 3;   /* or 4 or whatever you choose */

  set have;

  by ID START_DATE descending Interval_Type;

  gap = dif(START_DATE);

  if first.ID then do;

     pair_seq=1;

     within_seq=1;

  end;

  else if gap > 1 or gap < 0 then do;

      pair_seq + 1;

      within_seq=1;

  end;

  else within_seq + 1;

  if within_seq > 2 * x then delete;

  *drop gap;

run;

WAL83
Obsidian | Level 7

Thank you. Your code is counting within_seq correctly and

 

if within_seq > 2 * x then delete;

 

removes records where within_seq > 6 when X=3, but pairs with less than X=3 are still present. How can I delete these? For example, the first sequence of X=3 pairs is pair_seq=3 for ID=3. I want to delete those records in red.

 

x ID DataStartDate IntervalType pair StartDate gap pair_seq within_seq
3 3 12-Sep-14 REST 1 13-Sep-14 -5 1 1
3 3 12-Sep-14 ACTIVE 1 13-Sep-14 0 1 2
3 3 12-Sep-14 REST 2 14-Sep-14 1 1 3
3 3 12-Sep-14 ACTIVE 2 14-Sep-14 0 1 4
3 3 16-Oct-14 REST 1 17-Oct-14 33 2 1
3 3 16-Oct-14 ACTIVE 1 17-Oct-14 0 2 2
3 3 16-Oct-14 REST 1 20-Oct-14 3 3 1
3 3 16-Oct-14 ACTIVE 1 20-Oct-14 0 3 2
3 3 16-Oct-14 REST 2 21-Oct-14 1 3 3
3 3 16-Oct-14 ACTIVE 2 21-Oct-14 0 3 4
3 3 16-Oct-14 REST 3 22-Oct-14 1 3 5
3 3 16-Oct-14 ACTIVE 3 22-Oct-14 0 3

6

 

 

 

Astounding
PROC Star

One way or another, that will take going through the data twice.  SAS doesn't have a way to tell ahead of time how many observations there will be in a grouping.  Here's one way:

 

data really_want;

merge want ( in=keep_these keep=id pair_seq within_seq where=(within_seq=6))

      want;

by id pair_seq;

if keep_these;

run;

 

It's important to mention the data sets in the right order in the MERGE statement.  And you would have to ignore the warnings about WITHIN_SEQ being part of both data sets and one value overwriting the other.

 

@WAL83 to address the final concern, taking just the first set of records that satisfy X=3 ...

 

data take_these;

set want;

by id;

where within_seq=6;

if first.id;

keep id pair_seq;

run;

 

data really_want;

merge want take_these (in=take);

by id pair_seq;

if take;

run;

WAL83
Obsidian | Level 7

Thank you.

 

The data step removes records with within_seq < 6 when x=3 but I also need to only keep the first pair_seq where X=3. Some IDs have more than 1 set of within_seq=1,2,3,4,5,6 when X=3. If I filter the dataset on pair_seq=1 I will miss records where their first within_seq=1,2,3,4,5,6 is greater than pair_seq=1 (ID=3).

 

For example, ID=5 has 3 pair_seq=1,2,3 where within_seq=1,2,3,4,5,6. I only need to keep the first pair_seq (pair_seq=1). I think this is the last issue to account for to achieve my goal.

 

ID pair_seq within_seq x Data Start Date Interval Type pair Start Date gap
3 3 1 3 16-Oct-14 REST 1 20-Oct-14 3
3 3 2 3 16-Oct-14 ACTIVE 1 20-Oct-14 0
3 3 3 3 16-Oct-14 REST 2 21-Oct-14 1
3 3 4 3 16-Oct-14 ACTIVE 2 21-Oct-14 0
3 3 5 3 16-Oct-14 REST 3 22-Oct-14 1
3 3 6 3 16-Oct-14 ACTIVE 3 22-Oct-14 0
5 1 1 3 22-May-13 REST 1 23-May-13 -42
5 1 2 3 22-May-13 ACTIVE 1 23-May-13 0
5 1 3 3 22-May-13 REST 2 24-May-13 1
5 1 4 3 22-May-13 ACTIVE 2 24-May-13 0
5 1 5 3 22-May-13 REST 3 25-May-13 1
5 1 6 3 22-May-13 ACTIVE 3 25-May-13 0
5 2 1 3 15-Jul-13 REST 1 16-Jul-13 42
5 2 2 3 15-Jul-13 ACTIVE 1 16-Jul-13 0
5 2 3 3 15-Jul-13 REST 2 17-Jul-13 1
5 2 4 3 15-Jul-13 ACTIVE 2 17-Jul-13 0
5 2 5 3 15-Jul-13 REST 3 18-Jul-13 1
5 2 6 3 15-Jul-13 ACTIVE 3 18-Jul-13 0
5 3 1 3 15-Jul-13 REST 1 29-Jul-13 6
5 3 2 3 15-Jul-13 ACTIVE 1 29-Jul-13 0
5 3 3 3 15-Jul-13 REST 2 30-Jul-13 1
5 3 4 3 15-Jul-13 ACTIVE 2 30-Jul-13 0
5 3 5 3 15-Jul-13 REST 3 31-Jul-13 1
5 3 6 3 15-Jul-13 ACTIVE 3 31-Jul-13 0
WAL83
Obsidian | Level 7

That works perfect! Thank you!!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1382 views
  • 1 like
  • 3 in conversation