Hello,
I'm trying to determine the number of patients in my dataset that have at least X contiguos paired records by date (DAY & NIGHT RECORD=PAIR). I need to delete records that do not have a PAIR and then create "PAIR#" value to flag contiguous records for the patient. PAIR# = 3 would indicate this is the 3rd contiguous PAIR of records for this patient. Start "PAIR#" over again at 1 if there are non-contiguous paired date records for a patient, in other words PAIR# does not have to be unique for a patient. See below example dataset.
ID | PAIR# | Start Date | Start Time | Value | Delete? |
1 | 12-Feb-14 | 6:00:00 | DAY | DELETE | |
1 | 1 | 12-Feb-14 | 6:01:00 | DAY | KEEP |
1 | 1 | 12-Feb-14 | 22:00:00 | NIGHT | KEEP |
1 | 2 | 13-Feb-14 | 6:00:00 | DAY | KEEP |
1 | 2 | 13-Feb-14 | 22:00:00 | NIGHT | KEEP |
1 | 3 | 14-Feb-14 | 6:00:00 | DAY | KEEP |
1 | 3 | 14-Feb-14 | 22:00:00 | NIGHT | KEEP |
3 | 16-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 1 | 17-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 1 | 17-Jan-13 | 22:00:00 | NIGHT | KEEP |
3 | 18-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 21-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 1 | 22-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 1 | 22-Jan-13 | 22:00:00 | NIGHT | KEEP |
3 | 2 | 23-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 2 | 23-Jan-13 | 22:00:00 | NIGHT | KEEP |
4 | 9-Jan-13 | 22:00:00 | NIGHT | DELETE | |
4 | 10-Jan-13 | 22:00:00 | NIGHT | DELETE | |
4 | 1 | 11-Jan-13 | 6:01:00 | DAY | KEEP |
4 | 1 | 11-Jan-13 | 22:00:00 | NIGHT | KEEP |
4 | 2 | 12-Jan-13 | 6:01:00 | DAY | KEEP |
4 | 2 | 12-Jan-13 | 22:00:00 | NIGHT | KEEP |
5 | 1 | 20-Feb-13 | 22:00:00 | NIGHT | KEEP |
5 | 1 | 20-Feb-13 | 6:01:00 | DAY | KEEP |
5 | 22-Feb-13 | 22:00:00 | NIGHT | DELETE | |
5 | 24-Feb-13 | 22:00:00 | NIGHT | DELETE | |
5 | 1 | 6-Mar-13 | 6:01:00 | DAY | KEEP |
5 | 1 | 6-Mar-13 | 22:00:00 | NIGHT | KEEP |
5 | 2 | 7-Mar-13 | 6:01:00 | DAY | KEEP |
5 | 2 | 7-Mar-13 | 22:00:00 | NIGHT | KEEP |
6 | 1 | 4-Oct-13 | 6:00:00 | DAY | KEEP |
6 | 1 | 4-Oct-13 | 22:00:00 | NIGHT | KEEP |
6 | 2 | 5-Oct-13 | 6:00:00 | DAY | KEEP |
6 | 2 | 5-Oct-13 | 22:00:00 | NIGHT | KEEP |
6 | 7-Oct-13 | 6:00:00 | DAY | DELETE |
Indeed! I missed the case where it was the last.id and started with an otherwise potentially valid record. This code should correct for that:
data want (drop=_:); set have; by id; length action $6; retain _next_keep _last_kept_date _pair; if first.id then _pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; _last_kept_date=start_date; pair=_pair; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; if 0<=(start_date-_last_kept_date)<=1 then _pair+1; else _pair=1; pair=_pair; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else do; if _next_keep eq 'KEEP' then do; action='KEEP'; pair=_pair; output; _next_keep='DELETE'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; run;
Art, CEO, AnalystFinder.com
I think that the following does what you want:
data want (drop=_:);
set have;
by id;
retain _next_keep pair;
if first.id then pair=0;
if not last.id then do;
_next=_n_+1;
set have (keep=start_date value
rename=(start_date=_next_start
value=_next_value)) point=_next;
if _next_keep='KEEP' then do;
output;
_next_keep='DELETE';
end;
else if start_date eq _next_start and
value eq 'DAY' and
_next_value eq 'NIGHT' then do;
pair+1;
output;
_next_keep='KEEP';
end;
else _next_keep='DELETE';
end;
else if _next_keep eq 'KEEP' then do;
output;
_next_keep='DELETE';
end;
run;
HTH,
Art, CEO, AnalystFinder.com
You identify pairs as any two consecutive records with the same id, same start_date, in which one record has value='DAY' and the other has value='NIGHT', in any order.
A pair is assigned a pair number, in which the first pair for an id is pair=1, and subsequent pairs are 2, 3, 4, ... EXCEPT if a given pair is NOT exactly one day after the previous pair, then it is reset to pair=1.
data have;
input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.;
format start_date date9. start_time time8.0;
datalines;
1 12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3 16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3 18-Jan-13 22:00:00 NIGHT DELETE
3 21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4 9-Jan-13 22:00:00 NIGHT DELETE
4 10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5 22-Feb-13 22:00:00 NIGHT DELETE
5 24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6 7-Oct-13 6:00:00 DAY DELETE
run;
data want (drop=p nxt_val p_membr);
set have (keep=id start_date);
by id start_date;
retain p 0 /* to track pair number */
p_membr . /* 1=beginning of pair, 2=end of a pair */;
if first.start_date then do until (1);
if not(first.id=1 or lag(start_date)<start_date-1) then leave;
p=0;
p_membr=.;
end;
merge have
have (firstobs=2 keep=id value rename=(value=nxt_val));
/* If last record was p_membr=1, this one is p_membr=2*/
if p_membr=1 then do;
pair=p;
p_membr=2;
end;
else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
p=p+1;
pair=p;
p_membr=1;
end;
else p_membr=.;
if p_membr=. then action='DELETE';
else action='KEEP';
run;
NOtes:
This may be related to the solution you want. Did not see a example of output
data have;
input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.;
format start_date date9. start_time time8.0;
if _pair ne '' then do;
input #1 (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.
#2 (xID x_PAIR) (1. +1 1.) xStart_Date :date9. xStart_Time time8.0 xValue :$5. x_Delete $6.;
if value='DAY' and xvalue='NIGHT' then output;
end;
cards4;
1 12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3 16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3 18-Jan-13 22:00:00 NIGHT DELETE
3 21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4 9-Jan-13 22:00:00 NIGHT DELETE
4 10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5 22-Feb-13 22:00:00 NIGHT DELETE
5 24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6 7-Oct-13 6:00:00 DAY DELETE
;;;;
run;quit;
My aplogies for not including an example output dataset. Also it may be easier to understand my goal if I only include variables available in my dataset.
data have;
input ID 4. Start_Date :date9. Start_Time time8.0 End_Time time8.0 Value :$5.;
format Start_Date date9. Start_Time time8.0 End_time time8.0;
datalines;
1024 22-Feb-14 6:00 21:59 DAY
1024 22-Feb-14 22:00 5:59 NIGHT
1024 23-Feb-14 6:00 21:59 DAY
1024 23-Feb-14 22:00 5:59 NIGHT
1024 24-Feb-14 6:00 21:59 DAY
1024 24-Feb-14 22:00 5:59 NIGHT
1040 3-Apr-14 6:00 21:59 DAY
1040 3-Apr-14 22:00 5:59 NIGHT
1040 4-Apr-14 22:00 5:59 NIGHT
1040 5-Apr-14 6:00 21:59 DAY
1040 5-Apr-14 22:00 5:59 NIGHT
1040 6-Apr-14 22:00 5:59 NIGHT
1040 7-Apr-14 6:00 21:59 DAY
1040 7-Apr-14 22:00 5:59 NIGHT
2003 1-Jul-13 6:00 21:59 DAY
2003 1-Jul-13 22:00 5:59 NIGHT
2003 2-Jul-13 22:00 5:59 NIGHT
2003 3-Jul-13 6:00 21:59 DAY
2003 3-Jul-13 22:00 5:59 NIGHT
2003 4-Jul-13 6:00 21:59 DAY
2003 4-Jul-13 22:00 5:59 NIGHT
2003 5-Jul-13 6:00 21:59 DAY
2010 2-Jun-13 6:01 21:59 DAY
2010 2-Jun-13 22:00 6:00 NIGHT
2010 16-Jul-13 6:01 21:59 DAY
2010 16-Jul-13 22:00 6:00 NIGHT
2010 17-Jul-13 6:01 21:59 DAY
2010 17-Jul-13 22:00 6:00 NIGHT
2010 24-Jul-13 6:01 21:59 DAY
2010 24-Jul-13 22:00 6:00 NIGHT
2010 25-Jul-13 22:00 6:00 NIGHT
2010 26-Jul-13 22:00 6:00 NIGHT
2010 28-Jul-13 22:00 6:00 NIGHT
2010 29-Jul-13 6:01 21:59 DAY
2010 29-Jul-13 22:00 6:00 NIGHT
2010 1-Aug-13 6:01 21:59 DAY
2010 9-Aug-13 6:01 21:59 DAY
2010 9-Aug-13 22:00 6:00 NIGHT
2010 10-Aug-13 6:01 21:59 DAY
2010 10-Aug-13 22:00 6:00 NIGHT
2010 11-Aug-13 22:00 6:00 NIGHT
2010 12-Aug-13 22:00 6:00 NIGHT
2010 1-Mar-14 22:00 5:59 NIGHT
2010 2-Mar-14 22:00 5:59 NIGHT
2010 3-Mar-14 22:00 5:59 NIGHT
2010 5-Mar-14 6:00 21:59 DAY
2010 5-Mar-14 22:00 5:59 NIGHT
2010 6-Mar-14 6:00 21:59 DAY
2010 6-Mar-14 22:00 5:59 NIGHT
run;
Output dataset with addition of 2 new variables added (Action, Pair).
Action | Pair | ID | Start Date | Start Time | End Time | Value |
KEEP | 1 | 1024 | 22-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1024 | 22-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 1024 | 23-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 1024 | 23-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 3 | 1024 | 24-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 3 | 1024 | 24-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 3-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 3-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 4-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 5-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 5-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 6-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 7-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 7-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 1-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 1-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 2-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 3-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 3-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2003 | 4-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2003 | 4-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 5-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 16-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 16-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 17-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 17-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 25-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 26-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 28-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 29-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 29-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 10-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 10-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 11-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 12-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 2-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 3-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2010 | 5-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 5-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2010 | 6-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2010 | 6-Mar-14 | 22:00 | 5:59 | NIGHT |
The code I originally suggested assumed that you actually wanted to delete those records. The following does what you are currently asking for:
data want (drop=_:); set have; by id; length action $6; retain _next_keep pair; if first.id then pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; pair+1; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else if _next_keep eq 'KEEP' then do; action='KEEP'; output; _next_keep='DELETE'; end; run;
HTH,
Art, CEO, AnalystFinder.com
Thanks for the quick reply. Your adjusted code achieves all my goals except reset pair count to 1 if there are non-contiguous date records. See records in bold red in below example output dataset.
Action | Pair | ID | Start Date | Start Time | End Time | Value |
KEEP | 1 | 1024 | 22-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1024 | 22-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 1024 | 23-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 1024 | 23-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 3 | 1024 | 24-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 3 | 1024 | 24-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 3-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 3-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 4-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 5-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 5-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 6-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 7-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 7-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 1-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 1-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 2-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 3-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 3-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2003 | 4-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2003 | 4-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 5-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 16-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 16-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 17-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 17-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 25-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 26-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 28-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 29-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 29-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 10-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 10-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 11-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 12-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 2-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 3-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2010 | 5-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 5-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2010 | 6-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2010 | 6-Mar-14 | 22:00 | 5:59 | NIGHT |
I question your pair assignment for the following two records:
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
according to your rules, I'd think they should be 1s.
If so, the following should work:
data want (drop=_:); set have; by id; length action $6; retain _next_keep _last_kept_date _pair; if first.id then _pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; _last_kept_date=start_date; pair=_pair; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; if 0<=(start_date-_last_kept_date)<=1 then _pair+1; else _pair=1; pair=_pair; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else if _next_keep eq 'KEEP' then do; action='KEEP'; pair=_pair; output; _next_keep='DELETE'; end; run;
HTH,
Art, CEO, AnalystFinder.com
This code produces what you want, includng the reset to pair=1 when a pair is more than one day after the most preceding pair.
HOWEVER, it appears your sample answer is wrong for the 24-July-2013 pair, which you have assigned as pair=3. But the preceding pair is 1 week prior (17-JUL-13), so the 24-jul pair should be pair=1.
data want (drop=p nxt_val p_membr pair_date);
set have (keep=id start_date);
by id start_date;
retain p 0 /* to track pair number */
p_membr . /* 1=beginning of pair, 2=end of a pair */
pair_date /* Date of most recent pair */
if first.start_date=1 then do;
if first.id then p=0;
if pair_date < start_date-1 then p=0;
end;
merge have
have (firstobs=2 keep=id value rename=(value=nxt_val));
/* If retain p_membr=1, this one must be p_membr=2*/
if p_membr=1 then do;
pair=p;
p_membr=2;
end;
else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
p_membr=1;
pair_date=start_date;
p=p+1;
pair=p;
end;
else p_membr=.;
if p_membr=. then action='DELETE';
else action='KEEP';
run;
Aye, yes you are correct. Long day 😞
Your code does work very well although it renumbers the last pair id to the next id.. See example output dataset below...
ID | Start_Date | Start_Time | End_Time | Value | pair | action |
1024 | 22-Feb-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
1024 | 22-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
1024 | 23-Feb-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
1024 | 23-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
1024 | 24-Feb-14 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
1040 | 24-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
1040 | 3-Apr-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
1040 | 3-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
1040 | 4-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
1040 | 5-Apr-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
1040 | 5-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
1040 | 6-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
1040 | 7-Apr-14 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
2003 | 7-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
2003 | 1-Jul-13 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
2003 | 1-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
2003 | 2-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2003 | 3-Jul-13 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
2003 | 3-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
2003 | 4-Jul-13 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
2003 | 4-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
2010 | 5-Jul-13 | 6:00:00 | 21:59:00 | DAY | . | DELETE |
2010 | 2-Jun-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 2-Jun-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 16-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 16-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 17-Jul-13 | 6:01:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 17-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 2 | KEEP |
2010 | 24-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 24-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 25-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 26-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 28-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 29-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 29-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 1-Aug-13 | 6:01:00 | 21:59:00 | DAY | . | DELETE |
2010 | 9-Aug-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 9-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 10-Aug-13 | 6:01:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 10-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | 2 | KEEP |
2010 | 11-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 12-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 1-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 2-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 3-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 5-Mar-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 5-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
2010 | 6-Mar-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 6-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
Did you try my revised code?
Art, CEO, AnalystFinder.com
Yes. It works except for the resetting of pair = 1 when non-contiguous date occurs.
Not the code I posted about a half hour ago .. I don't think!
Art, CEO, AnalystFinder.com
I think it's working. I'm double checking. I'm getting tired 😞
Please mark @art297's message with the working code as the solution, not his request for you to test it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.