DATA Step, Macro, Functions and more

For each patient, keep only paired records by date and then do counts based on # of contiguous pairs

Accepted Solution Solved
Reply
Contributor
Posts: 41
Accepted Solution

For each patient, keep only paired records by date and then do counts based on # of contiguous pairs

Hello,

 

I'm trying to determine the number of patients in my dataset that have at least X contiguos paired records by date (DAY & NIGHT RECORD=PAIR). I need to delete records that do not have a PAIR and then create "PAIR#" value to flag contiguous records for the patient. PAIR# = 3 would indicate this is the 3rd contiguous PAIR of records for this patient. Start "PAIR#" over again at 1 if there are non-contiguous paired date records for a patient, in other words PAIR# does not have to be unique for a patient. See below example dataset.

 

ID PAIR# Start Date Start Time Value Delete?
1   12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3   16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3   18-Jan-13 22:00:00 NIGHT DELETE
3   21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4   9-Jan-13 22:00:00 NIGHT DELETE
4   10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5   22-Feb-13 22:00:00 NIGHT DELETE
5   24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6   7-Oct-13 6:00:00 DAY DELETE

Accepted Solutions
Solution
‎02-16-2017 11:29 PM
PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Indeed! I missed the case where it was the last.id and started with an otherwise potentially valid record. This code should correct for that:

 

data want (drop=_:);
  set have;
  by id;
  length action $6;
  retain _next_keep _last_kept_date _pair;
  if first.id then _pair=0;
  if not last.id then do;
    _next=_n_+1;
    set have (keep=start_date value
         rename=(start_date=_next_start
                 value=_next_value)) point=_next;
    if _next_keep='KEEP' then do;
      action='KEEP';
      _last_kept_date=start_date;
      pair=_pair;
      output;
      _next_keep='DELETE';
    end;
    else if start_date eq _next_start and
            value eq 'DAY' and
            _next_value eq 'NIGHT' then do;
      if 0<=(start_date-_last_kept_date)<=1 then _pair+1;
      else _pair=1;
      pair=_pair;
      action='KEEP';
      output;
      _next_keep='KEEP';
    end;
    else do;
      action='DELETE';
      output;
      _next_keep='DELETE';
    end;
  end;
  else do;
    if _next_keep eq 'KEEP' then do;
      action='KEEP';
      pair=_pair;
      output;
      _next_keep='DELETE';
    end;
    else do;
      action='DELETE';
      output;
      _next_keep='DELETE';
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

[ Edited ]

I think that the following does what you want:

 

data want (drop=_:);
set have;
by id;
retain _next_keep pair;
if first.id then pair=0;
if not last.id then do;
_next=_n_+1;
set have (keep=start_date value
rename=(start_date=_next_start
value=_next_value)) point=_next;
if _next_keep='KEEP' then do;
output;
_next_keep='DELETE';
end;
else if start_date eq _next_start and
value eq 'DAY' and
_next_value eq 'NIGHT' then do;
pair+1;
output;
_next_keep='KEEP';
end;
else _next_keep='DELETE';
end;
else if _next_keep eq 'KEEP' then do;
output;
_next_keep='DELETE';
end;
run;

HTH,

Art, CEO, AnalystFinder.com

 

Valued Guide
Posts: 797

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

[ Edited ]

You identify pairs as any two consecutive records with the same id, same start_date, in which one record has value='DAY' and the other has value='NIGHT', in any order.

 

A pair is assigned a pair number, in which the first pair for an id is pair=1, and subsequent pairs are 2, 3, 4, ...    EXCEPT if a given pair is NOT exactly one day after the previous pair, then it is reset to pair=1.

 

data have;
  input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5.  _Delete $6.;
  format start_date date9. start_time time8.0;
datalines;
1   12-Feb-14 6:00:00 DAY DELETE 
1 1 12-Feb-14 6:01:00 DAY KEEP 
1 1 12-Feb-14 22:00:00 NIGHT KEEP 
1 2 13-Feb-14 6:00:00 DAY KEEP 
1 2 13-Feb-14 22:00:00 NIGHT KEEP 
1 3 14-Feb-14 6:00:00 DAY KEEP 
1 3 14-Feb-14 22:00:00 NIGHT KEEP 
3   16-Jan-13 22:00:00 NIGHT DELETE 
3 1 17-Jan-13 6:00:00 DAY KEEP 
3 1 17-Jan-13 22:00:00 NIGHT KEEP 
3   18-Jan-13 22:00:00 NIGHT DELETE 
3   21-Jan-13 22:00:00 NIGHT DELETE 
3 1 22-Jan-13 6:00:00 DAY KEEP 
3 1 22-Jan-13 22:00:00 NIGHT KEEP 
3 2 23-Jan-13 6:00:00 DAY KEEP 
3 2 23-Jan-13 22:00:00 NIGHT KEEP 
4   9-Jan-13 22:00:00 NIGHT DELETE 
4   10-Jan-13 22:00:00 NIGHT DELETE 
4 1 11-Jan-13 6:01:00 DAY KEEP 
4 1 11-Jan-13 22:00:00 NIGHT KEEP 
4 2 12-Jan-13 6:01:00 DAY KEEP 
4 2 12-Jan-13 22:00:00 NIGHT KEEP 
5 1 20-Feb-13 22:00:00 NIGHT KEEP 
5 1 20-Feb-13 6:01:00 DAY KEEP 
5   22-Feb-13 22:00:00 NIGHT DELETE 
5   24-Feb-13 22:00:00 NIGHT DELETE 
5 1 6-Mar-13 6:01:00 DAY KEEP 
5 1 6-Mar-13 22:00:00 NIGHT KEEP 
5 2 7-Mar-13 6:01:00 DAY KEEP 
5 2 7-Mar-13 22:00:00 NIGHT KEEP 
6 1 4-Oct-13 6:00:00 DAY KEEP 
6 1 4-Oct-13 22:00:00 NIGHT KEEP 
6 2 5-Oct-13 6:00:00 DAY KEEP 
6 2 5-Oct-13 22:00:00 NIGHT KEEP 
6   7-Oct-13 6:00:00 DAY DELETE 
run;
data want (drop=p nxt_val p_membr);

  set have (keep=id start_date);
  by id start_date;

  retain p 0       /* to track pair number */
         p_membr . /* 1=beginning of pair, 2=end of a pair */;

  if first.start_date then do until (1);
    if not(first.id=1 or lag(start_date)<start_date-1) then leave;
    p=0;
    p_membr=.;
  end;

  merge have 
        have (firstobs=2 keep=id value rename=(value=nxt_val));

  /* If last record was p_membr=1, this one is p_membr=2*/
  if p_membr=1 then do;
    pair=p;
	p_membr=2;
  end;
  else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
	p=p+1;
	pair=p;
	p_membr=1;
  end;
  else p_membr=.;
    
  if p_membr=. then action='DELETE';
  else action='KEEP';
run;

 

 

NOtes:

  1. The set with BY statement is just used to determine first.id, first.start_date, and corresponding last.'s
  2. The self merge has firstobs=2 so that one can look ahead at the next records VALUE variable, renamed to "nxt_val".
  3. P_MEMBR=1 for the first record in a pair, and p_membr=2 for the 2nd.  Since p_membr is a "retained" value, if it's a one, then the prior record was the beginning of a pair, and current record must be the end of the pair.
Valued Guide
Posts: 505

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

This may be related to the solution you want. Did not see a example of output

data have;
  input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5.  _Delete $6.;
  format start_date date9. start_time time8.0;
  if _pair ne '' then do;
     input #1 (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5.  _Delete $6.
           #2 (xID x_PAIR) (1. +1 1.) xStart_Date :date9. xStart_Time time8.0 xValue :$5.  x_Delete $6.;
     if value='DAY' and xvalue='NIGHT' then output;
  end;
cards4;
1   12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3   16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3   18-Jan-13 22:00:00 NIGHT DELETE
3   21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4   9-Jan-13 22:00:00 NIGHT DELETE
4   10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5   22-Feb-13 22:00:00 NIGHT DELETE
5   24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6   7-Oct-13 6:00:00 DAY DELETE
;;;;
run;quit;
Contributor
Posts: 41

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

My aplogies for not including an example output dataset. Also it may be easier to understand my goal if I only include variables available in my dataset.

 

data have;

input ID 4. Start_Date :date9. Start_Time time8.0 End_Time time8.0 Value :$5.;

format Start_Date date9. Start_Time time8.0 End_time time8.0;

datalines;

1024 22-Feb-14 6:00 21:59 DAY

1024 22-Feb-14 22:00 5:59 NIGHT

1024 23-Feb-14 6:00 21:59 DAY

1024 23-Feb-14 22:00 5:59 NIGHT

1024 24-Feb-14 6:00 21:59 DAY

1024 24-Feb-14 22:00 5:59 NIGHT

1040 3-Apr-14 6:00 21:59 DAY

1040 3-Apr-14 22:00 5:59 NIGHT

1040 4-Apr-14 22:00 5:59 NIGHT

1040 5-Apr-14 6:00 21:59 DAY

1040 5-Apr-14 22:00 5:59 NIGHT

1040 6-Apr-14 22:00 5:59 NIGHT

1040 7-Apr-14 6:00 21:59 DAY

1040 7-Apr-14 22:00 5:59 NIGHT

2003 1-Jul-13 6:00 21:59 DAY

2003 1-Jul-13 22:00 5:59 NIGHT

2003 2-Jul-13 22:00 5:59 NIGHT

2003 3-Jul-13 6:00 21:59 DAY

2003 3-Jul-13 22:00 5:59 NIGHT

2003 4-Jul-13 6:00 21:59 DAY

2003 4-Jul-13 22:00 5:59 NIGHT

2003 5-Jul-13 6:00 21:59 DAY

2010 2-Jun-13 6:01 21:59 DAY

2010 2-Jun-13 22:00 6:00 NIGHT

2010 16-Jul-13 6:01 21:59 DAY

2010 16-Jul-13 22:00 6:00 NIGHT

2010 17-Jul-13 6:01 21:59 DAY

2010 17-Jul-13 22:00 6:00 NIGHT

2010 24-Jul-13 6:01 21:59 DAY

2010 24-Jul-13 22:00 6:00 NIGHT

2010 25-Jul-13 22:00 6:00 NIGHT

2010 26-Jul-13 22:00 6:00 NIGHT

2010 28-Jul-13 22:00 6:00 NIGHT

2010 29-Jul-13 6:01 21:59 DAY

2010 29-Jul-13 22:00 6:00 NIGHT

2010 1-Aug-13 6:01 21:59 DAY

2010 9-Aug-13 6:01 21:59 DAY

2010 9-Aug-13 22:00 6:00 NIGHT

2010 10-Aug-13 6:01 21:59 DAY

2010 10-Aug-13 22:00 6:00 NIGHT

2010 11-Aug-13 22:00 6:00 NIGHT

2010 12-Aug-13 22:00 6:00 NIGHT

2010 1-Mar-14 22:00 5:59 NIGHT

2010 2-Mar-14 22:00 5:59 NIGHT

2010 3-Mar-14 22:00 5:59 NIGHT

2010 5-Mar-14 6:00 21:59 DAY

2010 5-Mar-14 22:00 5:59 NIGHT

2010 6-Mar-14 6:00 21:59 DAY

2010 6-Mar-14 22:00 5:59 NIGHT

run;

 

Output dataset with addition of 2 new variables added (Action, Pair).

 

Action Pair ID Start Date Start Time End Time Value
KEEP 1 1024 22-Feb-14 6:00 21:59 DAY
KEEP 1 1024 22-Feb-14 22:00 5:59 NIGHT
KEEP 2 1024 23-Feb-14 6:00 21:59 DAY
KEEP 2 1024 23-Feb-14 22:00 5:59 NIGHT
KEEP 3 1024 24-Feb-14 6:00 21:59 DAY
KEEP 3 1024 24-Feb-14 22:00 5:59 NIGHT
KEEP 1 1040 3-Apr-14 6:00 21:59 DAY
KEEP 1 1040 3-Apr-14 22:00 5:59 NIGHT
DELETE . 1040 4-Apr-14 22:00 5:59 NIGHT
KEEP 1 1040 5-Apr-14 6:00 21:59 DAY
KEEP 1 1040 5-Apr-14 22:00 5:59 NIGHT
DELETE . 1040 6-Apr-14 22:00 5:59 NIGHT
KEEP 1 1040 7-Apr-14 6:00 21:59 DAY
KEEP 1 1040 7-Apr-14 22:00 5:59 NIGHT
KEEP 1 2003 1-Jul-13 6:00 21:59 DAY
KEEP 1 2003 1-Jul-13 22:00 5:59 NIGHT
DELETE . 2003 2-Jul-13 22:00 5:59 NIGHT
KEEP 1 2003 3-Jul-13 6:00 21:59 DAY
KEEP 1 2003 3-Jul-13 22:00 5:59 NIGHT
KEEP 2 2003 4-Jul-13 6:00 21:59 DAY
KEEP 2 2003 4-Jul-13 22:00 5:59 NIGHT
DELETE . 2003 5-Jul-13 6:00 21:59 DAY
KEEP 1 2010 2-Jun-13 6:01 21:59 DAY
KEEP 1 2010 2-Jun-13 22:00 6:00 NIGHT
KEEP 1 2010 16-Jul-13 6:01 21:59 DAY
KEEP 1 2010 16-Jul-13 22:00 6:00 NIGHT
KEEP 2 2010 17-Jul-13 6:01 21:59 DAY
KEEP 2 2010 17-Jul-13 22:00 6:00 NIGHT
KEEP 3 2010 24-Jul-13 6:01 21:59 DAY
KEEP 3 2010 24-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 25-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 26-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 28-Jul-13 22:00 6:00 NIGHT
KEEP 1 2010 29-Jul-13 6:01 21:59 DAY
KEEP 1 2010 29-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 1-Aug-13 6:01 21:59 DAY
KEEP 1 2010 9-Aug-13 6:01 21:59 DAY
KEEP 1 2010 9-Aug-13 22:00 6:00 NIGHT
KEEP 2 2010 10-Aug-13 6:01 21:59 DAY
KEEP 2 2010 10-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 11-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 12-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 1-Mar-14 22:00 5:59 NIGHT
DELETE . 2010 2-Mar-14 22:00 5:59 NIGHT
DELETE . 2010 3-Mar-14 22:00 5:59 NIGHT
KEEP 1 2010 5-Mar-14 6:00 21:59 DAY
KEEP 1 2010 5-Mar-14 22:00 5:59 NIGHT
KEEP 2 2010 6-Mar-14 6:00 21:59 DAY
KEEP 2 2010 6-Mar-14 22:00 5:59 NIGHT
PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

The code I originally suggested assumed that you actually wanted to delete those records. The following does what you are currently asking for:

 

data want (drop=_:);
  set have;
  by id;
  length action $6;
  retain _next_keep pair;
  if first.id then pair=0;
  if not last.id then do;
    _next=_n_+1;
    set have (keep=start_date value
         rename=(start_date=_next_start
                 value=_next_value)) point=_next;
    if _next_keep='KEEP' then do;
      action='KEEP';
      output;
      _next_keep='DELETE';
    end;
    else if start_date eq _next_start and
            value eq 'DAY' and
            _next_value eq 'NIGHT' then do;
      pair+1;
      action='KEEP';
      output;
      _next_keep='KEEP';
    end;
    else do;
      action='DELETE';
      output;
      _next_keep='DELETE';
    end;
  end;
  else if _next_keep eq 'KEEP' then do;
    action='KEEP';
    output;
    _next_keep='DELETE';
  end;
run;

HTH,

Art, CEO, AnalystFinder.com

Contributor
Posts: 41

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Thanks for the quick reply. Your adjusted code achieves all my goals except reset pair count to 1 if there are non-contiguous date records. See records in bold red in below example output dataset.

 

Action Pair ID Start Date Start Time End Time Value
KEEP 1 1024 22-Feb-14 6:00 21:59 DAY
KEEP 1 1024 22-Feb-14 22:00 5:59 NIGHT
KEEP 2 1024 23-Feb-14 6:00 21:59 DAY
KEEP 2 1024 23-Feb-14 22:00 5:59 NIGHT
KEEP 3 1024 24-Feb-14 6:00 21:59 DAY
KEEP 3 1024 24-Feb-14 22:00 5:59 NIGHT
KEEP 1 1040 3-Apr-14 6:00 21:59 DAY
KEEP 1 1040 3-Apr-14 22:00 5:59 NIGHT
DELETE . 1040 4-Apr-14 22:00 5:59 NIGHT
KEEP 1 1040 5-Apr-14 6:00 21:59 DAY
KEEP 1 1040 5-Apr-14 22:00 5:59 NIGHT
DELETE . 1040 6-Apr-14 22:00 5:59 NIGHT
KEEP 1 1040 7-Apr-14 6:00 21:59 DAY
KEEP 1 1040 7-Apr-14 22:00 5:59 NIGHT
KEEP 1 2003 1-Jul-13 6:00 21:59 DAY
KEEP 1 2003 1-Jul-13 22:00 5:59 NIGHT
DELETE . 2003 2-Jul-13 22:00 5:59 NIGHT
KEEP 1 2003 3-Jul-13 6:00 21:59 DAY
KEEP 1 2003 3-Jul-13 22:00 5:59 NIGHT
KEEP 2 2003 4-Jul-13 6:00 21:59 DAY
KEEP 2 2003 4-Jul-13 22:00 5:59 NIGHT
DELETE . 2003 5-Jul-13 6:00 21:59 DAY
KEEP 1 2010 2-Jun-13 6:01 21:59 DAY
KEEP 1 2010 2-Jun-13 22:00 6:00 NIGHT
KEEP 1 2010 16-Jul-13 6:01 21:59 DAY
KEEP 1 2010 16-Jul-13 22:00 6:00 NIGHT
KEEP 2 2010 17-Jul-13 6:01 21:59 DAY
KEEP 2 2010 17-Jul-13 22:00 6:00 NIGHT
KEEP 3 2010 24-Jul-13 6:01 21:59 DAY
KEEP 3 2010 24-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 25-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 26-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 28-Jul-13 22:00 6:00 NIGHT
KEEP 1 2010 29-Jul-13 6:01 21:59 DAY
KEEP 1 2010 29-Jul-13 22:00 6:00 NIGHT
DELETE . 2010 1-Aug-13 6:01 21:59 DAY
KEEP 1 2010 9-Aug-13 6:01 21:59 DAY
KEEP 1 2010 9-Aug-13 22:00 6:00 NIGHT
KEEP 2 2010 10-Aug-13 6:01 21:59 DAY
KEEP 2 2010 10-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 11-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 12-Aug-13 22:00 6:00 NIGHT
DELETE . 2010 1-Mar-14 22:00 5:59 NIGHT
DELETE . 2010 2-Mar-14 22:00 5:59 NIGHT
DELETE . 2010 3-Mar-14 22:00 5:59 NIGHT
KEEP 1 2010 5-Mar-14 6:00 21:59 DAY
KEEP 1 2010 5-Mar-14 22:00 5:59 NIGHT
KEEP 2 2010 6-Mar-14 6:00 21:59 DAY
KEEP 2 2010 6-Mar-14 22:00 5:59 NIGHT

 

 

PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

I question your pair assignment for the following two records:

KEEP 3 2010 24-Jul-13 6:01 21:59 DAY
KEEP 3 2010 24-Jul-13 22:00 6:00 NIGHT

 

according to your rules, I'd think they should be 1s.

 

If so, the following should work:

 

data want (drop=_:);
  set have;
  by id;
  length action $6;
  retain _next_keep _last_kept_date _pair;
  if first.id then _pair=0;
  if not last.id then do;
    _next=_n_+1;
    set have (keep=start_date value
         rename=(start_date=_next_start
                 value=_next_value)) point=_next;
    if _next_keep='KEEP' then do;
      action='KEEP';
      _last_kept_date=start_date;
      pair=_pair;
      output;
      _next_keep='DELETE';
    end;
    else if start_date eq _next_start and
            value eq 'DAY' and
            _next_value eq 'NIGHT' then do;
      if 0<=(start_date-_last_kept_date)<=1 then _pair+1;
      else _pair=1;
      pair=_pair;
      action='KEEP';
      output;
      _next_keep='KEEP';
    end;
    else do;
      action='DELETE';
      output;
      _next_keep='DELETE';
    end;
  end;
  else if _next_keep eq 'KEEP' then do;
    action='KEEP';
    pair=_pair;
    output;
    _next_keep='DELETE';
  end;
run;

HTH,

Art, CEO, AnalystFinder.com

 

Valued Guide
Posts: 797

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

This code produces what you want, includng the reset to pair=1 when a pair is more than one day after the most preceding pair.

 

HOWEVER, it appears your sample answer is wrong for the 24-July-2013 pair, which you have assigned as pair=3.  But the preceding pair is 1 week prior (17-JUL-13), so the 24-jul pair should be pair=1.

 

data want (drop=p nxt_val p_membr pair_date);

  set have (keep=id start_date);
  by id start_date;

  retain p 0       /* to track pair number */
         p_membr . /* 1=beginning of pair, 2=end of a pair */
         pair_date /* Date of most recent pair */

  if first.start_date=1 then do;
    if first.id then p=0;
    if pair_date < start_date-1 then p=0;
  end;

  merge have 
        have (firstobs=2 keep=id value rename=(value=nxt_val));

  /* If retain p_membr=1, this one must be p_membr=2*/
  if p_membr=1 then do;
    pair=p;
    p_membr=2;
  end;
  else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
    p_membr=1;
    pair_date=start_date;
    p=p+1;
    pair=p;
  end;
  else p_membr=.;
    
  if p_membr=. then action='DELETE';
  else action='KEEP';

run;
Contributor
Posts: 41

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Aye, yes you are correct. Long day Smiley Sad

 

Your code does work very well although it renumbers the last pair id to the next id.. See example output dataset below...

 

ID Start_Date Start_Time End_Time Value pair action
1024 22-Feb-14 6:00:00 21:59:00 DAY 1 KEEP
1024 22-Feb-14 22:00:00 5:59:00 NIGHT 1 KEEP
1024 23-Feb-14 6:00:00 21:59:00 DAY 2 KEEP
1024 23-Feb-14 22:00:00 5:59:00 NIGHT 2 KEEP
1024 24-Feb-14 6:00:00 21:59:00 DAY 3 KEEP
1040 24-Feb-14 22:00:00 5:59:00 NIGHT 3 KEEP
1040 3-Apr-14 6:00:00 21:59:00 DAY 1 KEEP
1040 3-Apr-14 22:00:00 5:59:00 NIGHT 1 KEEP
1040 4-Apr-14 22:00:00 5:59:00 NIGHT . DELETE
1040 5-Apr-14 6:00:00 21:59:00 DAY 2 KEEP
1040 5-Apr-14 22:00:00 5:59:00 NIGHT 2 KEEP
1040 6-Apr-14 22:00:00 5:59:00 NIGHT . DELETE
1040 7-Apr-14 6:00:00 21:59:00 DAY 3 KEEP
2003 7-Apr-14 22:00:00 5:59:00 NIGHT 3 KEEP
2003 1-Jul-13 6:00:00 21:59:00 DAY 1 KEEP
2003 1-Jul-13 22:00:00 5:59:00 NIGHT 1 KEEP
2003 2-Jul-13 22:00:00 5:59:00 NIGHT . DELETE
2003 3-Jul-13 6:00:00 21:59:00 DAY 2 KEEP
2003 3-Jul-13 22:00:00 5:59:00 NIGHT 2 KEEP
2003 4-Jul-13 6:00:00 21:59:00 DAY 3 KEEP
2003 4-Jul-13 22:00:00 5:59:00 NIGHT 3 KEEP
2010 5-Jul-13 6:00:00 21:59:00 DAY . DELETE
2010 2-Jun-13 6:01:00 21:59:00 DAY 1 KEEP
2010 2-Jun-13 22:00:00 6:00:00 NIGHT 1 KEEP
2010 16-Jul-13 6:01:00 21:59:00 DAY 1 KEEP
2010 16-Jul-13 22:00:00 6:00:00 NIGHT 1 KEEP
2010 17-Jul-13 6:01:00 21:59:00 DAY 2 KEEP
2010 17-Jul-13 22:00:00 6:00:00 NIGHT 2 KEEP
2010 24-Jul-13 6:01:00 21:59:00 DAY 1 KEEP
2010 24-Jul-13 22:00:00 6:00:00 NIGHT 1 KEEP
2010 25-Jul-13 22:00:00 6:00:00 NIGHT . DELETE
2010 26-Jul-13 22:00:00 6:00:00 NIGHT . DELETE
2010 28-Jul-13 22:00:00 6:00:00 NIGHT . DELETE
2010 29-Jul-13 6:01:00 21:59:00 DAY 1 KEEP
2010 29-Jul-13 22:00:00 6:00:00 NIGHT 1 KEEP
2010 1-Aug-13 6:01:00 21:59:00 DAY . DELETE
2010 9-Aug-13 6:01:00 21:59:00 DAY 1 KEEP
2010 9-Aug-13 22:00:00 6:00:00 NIGHT 1 KEEP
2010 10-Aug-13 6:01:00 21:59:00 DAY 2 KEEP
2010 10-Aug-13 22:00:00 6:00:00 NIGHT 2 KEEP
2010 11-Aug-13 22:00:00 6:00:00 NIGHT . DELETE
2010 12-Aug-13 22:00:00 6:00:00 NIGHT . DELETE
2010 1-Mar-14 22:00:00 5:59:00 NIGHT . DELETE
2010 2-Mar-14 22:00:00 5:59:00 NIGHT . DELETE
2010 3-Mar-14 22:00:00 5:59:00 NIGHT . DELETE
2010 5-Mar-14 6:00:00 21:59:00 DAY 1 KEEP
2010 5-Mar-14 22:00:00 5:59:00 NIGHT 1 KEEP
2010 6-Mar-14 6:00:00 21:59:00 DAY 2 KEEP
2010 6-Mar-14 22:00:00 5:59:00 NIGHT 2 KEEP
PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Did you try my revised code?

 

Art, CEO, AnalystFinder.com

Contributor
Posts: 41

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Yes. It works except for the resetting of pair = 1 when non-contiguous date occurs.

PROC Star
Posts: 7,364

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Not the code I posted about a half hour ago .. I don't think!

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 41

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

I think it's working. I'm double checking. I'm getting tired Smiley Sad

Valued Guide
Posts: 797

Re: For each patient, keep only paired records by date and then do counts based on # of contiguous p

Please mark @art297's message with the working code as the solution, not his request for you to test it.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 241 views
  • 5 likes
  • 4 in conversation