Hello,
I'm trying to determine the number of patients in my dataset that have at least X contiguos paired records by date (DAY & NIGHT RECORD=PAIR). I need to delete records that do not have a PAIR and then create "PAIR#" value to flag contiguous records for the patient. PAIR# = 3 would indicate this is the 3rd contiguous PAIR of records for this patient. Start "PAIR#" over again at 1 if there are non-contiguous paired date records for a patient, in other words PAIR# does not have to be unique for a patient. See below example dataset.
ID | PAIR# | Start Date | Start Time | Value | Delete? |
1 | 12-Feb-14 | 6:00:00 | DAY | DELETE | |
1 | 1 | 12-Feb-14 | 6:01:00 | DAY | KEEP |
1 | 1 | 12-Feb-14 | 22:00:00 | NIGHT | KEEP |
1 | 2 | 13-Feb-14 | 6:00:00 | DAY | KEEP |
1 | 2 | 13-Feb-14 | 22:00:00 | NIGHT | KEEP |
1 | 3 | 14-Feb-14 | 6:00:00 | DAY | KEEP |
1 | 3 | 14-Feb-14 | 22:00:00 | NIGHT | KEEP |
3 | 16-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 1 | 17-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 1 | 17-Jan-13 | 22:00:00 | NIGHT | KEEP |
3 | 18-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 21-Jan-13 | 22:00:00 | NIGHT | DELETE | |
3 | 1 | 22-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 1 | 22-Jan-13 | 22:00:00 | NIGHT | KEEP |
3 | 2 | 23-Jan-13 | 6:00:00 | DAY | KEEP |
3 | 2 | 23-Jan-13 | 22:00:00 | NIGHT | KEEP |
4 | 9-Jan-13 | 22:00:00 | NIGHT | DELETE | |
4 | 10-Jan-13 | 22:00:00 | NIGHT | DELETE | |
4 | 1 | 11-Jan-13 | 6:01:00 | DAY | KEEP |
4 | 1 | 11-Jan-13 | 22:00:00 | NIGHT | KEEP |
4 | 2 | 12-Jan-13 | 6:01:00 | DAY | KEEP |
4 | 2 | 12-Jan-13 | 22:00:00 | NIGHT | KEEP |
5 | 1 | 20-Feb-13 | 22:00:00 | NIGHT | KEEP |
5 | 1 | 20-Feb-13 | 6:01:00 | DAY | KEEP |
5 | 22-Feb-13 | 22:00:00 | NIGHT | DELETE | |
5 | 24-Feb-13 | 22:00:00 | NIGHT | DELETE | |
5 | 1 | 6-Mar-13 | 6:01:00 | DAY | KEEP |
5 | 1 | 6-Mar-13 | 22:00:00 | NIGHT | KEEP |
5 | 2 | 7-Mar-13 | 6:01:00 | DAY | KEEP |
5 | 2 | 7-Mar-13 | 22:00:00 | NIGHT | KEEP |
6 | 1 | 4-Oct-13 | 6:00:00 | DAY | KEEP |
6 | 1 | 4-Oct-13 | 22:00:00 | NIGHT | KEEP |
6 | 2 | 5-Oct-13 | 6:00:00 | DAY | KEEP |
6 | 2 | 5-Oct-13 | 22:00:00 | NIGHT | KEEP |
6 | 7-Oct-13 | 6:00:00 | DAY | DELETE |
Indeed! I missed the case where it was the last.id and started with an otherwise potentially valid record. This code should correct for that:
data want (drop=_:); set have; by id; length action $6; retain _next_keep _last_kept_date _pair; if first.id then _pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; _last_kept_date=start_date; pair=_pair; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; if 0<=(start_date-_last_kept_date)<=1 then _pair+1; else _pair=1; pair=_pair; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else do; if _next_keep eq 'KEEP' then do; action='KEEP'; pair=_pair; output; _next_keep='DELETE'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; run;
Art, CEO, AnalystFinder.com
I think that the following does what you want:
data want (drop=_:);
set have;
by id;
retain _next_keep pair;
if first.id then pair=0;
if not last.id then do;
_next=_n_+1;
set have (keep=start_date value
rename=(start_date=_next_start
value=_next_value)) point=_next;
if _next_keep='KEEP' then do;
output;
_next_keep='DELETE';
end;
else if start_date eq _next_start and
value eq 'DAY' and
_next_value eq 'NIGHT' then do;
pair+1;
output;
_next_keep='KEEP';
end;
else _next_keep='DELETE';
end;
else if _next_keep eq 'KEEP' then do;
output;
_next_keep='DELETE';
end;
run;
HTH,
Art, CEO, AnalystFinder.com
You identify pairs as any two consecutive records with the same id, same start_date, in which one record has value='DAY' and the other has value='NIGHT', in any order.
A pair is assigned a pair number, in which the first pair for an id is pair=1, and subsequent pairs are 2, 3, 4, ... EXCEPT if a given pair is NOT exactly one day after the previous pair, then it is reset to pair=1.
data have;
input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.;
format start_date date9. start_time time8.0;
datalines;
1 12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3 16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3 18-Jan-13 22:00:00 NIGHT DELETE
3 21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4 9-Jan-13 22:00:00 NIGHT DELETE
4 10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5 22-Feb-13 22:00:00 NIGHT DELETE
5 24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6 7-Oct-13 6:00:00 DAY DELETE
run;
data want (drop=p nxt_val p_membr);
set have (keep=id start_date);
by id start_date;
retain p 0 /* to track pair number */
p_membr . /* 1=beginning of pair, 2=end of a pair */;
if first.start_date then do until (1);
if not(first.id=1 or lag(start_date)<start_date-1) then leave;
p=0;
p_membr=.;
end;
merge have
have (firstobs=2 keep=id value rename=(value=nxt_val));
/* If last record was p_membr=1, this one is p_membr=2*/
if p_membr=1 then do;
pair=p;
p_membr=2;
end;
else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
p=p+1;
pair=p;
p_membr=1;
end;
else p_membr=.;
if p_membr=. then action='DELETE';
else action='KEEP';
run;
NOtes:
This may be related to the solution you want. Did not see a example of output
data have;
input (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.;
format start_date date9. start_time time8.0;
if _pair ne '' then do;
input #1 (ID _PAIR) (1. +1 1.) Start_Date :date9. Start_Time time8.0 Value :$5. _Delete $6.
#2 (xID x_PAIR) (1. +1 1.) xStart_Date :date9. xStart_Time time8.0 xValue :$5. x_Delete $6.;
if value='DAY' and xvalue='NIGHT' then output;
end;
cards4;
1 12-Feb-14 6:00:00 DAY DELETE
1 1 12-Feb-14 6:01:00 DAY KEEP
1 1 12-Feb-14 22:00:00 NIGHT KEEP
1 2 13-Feb-14 6:00:00 DAY KEEP
1 2 13-Feb-14 22:00:00 NIGHT KEEP
1 3 14-Feb-14 6:00:00 DAY KEEP
1 3 14-Feb-14 22:00:00 NIGHT KEEP
3 16-Jan-13 22:00:00 NIGHT DELETE
3 1 17-Jan-13 6:00:00 DAY KEEP
3 1 17-Jan-13 22:00:00 NIGHT KEEP
3 18-Jan-13 22:00:00 NIGHT DELETE
3 21-Jan-13 22:00:00 NIGHT DELETE
3 1 22-Jan-13 6:00:00 DAY KEEP
3 1 22-Jan-13 22:00:00 NIGHT KEEP
3 2 23-Jan-13 6:00:00 DAY KEEP
3 2 23-Jan-13 22:00:00 NIGHT KEEP
4 9-Jan-13 22:00:00 NIGHT DELETE
4 10-Jan-13 22:00:00 NIGHT DELETE
4 1 11-Jan-13 6:01:00 DAY KEEP
4 1 11-Jan-13 22:00:00 NIGHT KEEP
4 2 12-Jan-13 6:01:00 DAY KEEP
4 2 12-Jan-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 22:00:00 NIGHT KEEP
5 1 20-Feb-13 6:01:00 DAY KEEP
5 22-Feb-13 22:00:00 NIGHT DELETE
5 24-Feb-13 22:00:00 NIGHT DELETE
5 1 6-Mar-13 6:01:00 DAY KEEP
5 1 6-Mar-13 22:00:00 NIGHT KEEP
5 2 7-Mar-13 6:01:00 DAY KEEP
5 2 7-Mar-13 22:00:00 NIGHT KEEP
6 1 4-Oct-13 6:00:00 DAY KEEP
6 1 4-Oct-13 22:00:00 NIGHT KEEP
6 2 5-Oct-13 6:00:00 DAY KEEP
6 2 5-Oct-13 22:00:00 NIGHT KEEP
6 7-Oct-13 6:00:00 DAY DELETE
;;;;
run;quit;
My aplogies for not including an example output dataset. Also it may be easier to understand my goal if I only include variables available in my dataset.
data have;
input ID 4. Start_Date :date9. Start_Time time8.0 End_Time time8.0 Value :$5.;
format Start_Date date9. Start_Time time8.0 End_time time8.0;
datalines;
1024 22-Feb-14 6:00 21:59 DAY
1024 22-Feb-14 22:00 5:59 NIGHT
1024 23-Feb-14 6:00 21:59 DAY
1024 23-Feb-14 22:00 5:59 NIGHT
1024 24-Feb-14 6:00 21:59 DAY
1024 24-Feb-14 22:00 5:59 NIGHT
1040 3-Apr-14 6:00 21:59 DAY
1040 3-Apr-14 22:00 5:59 NIGHT
1040 4-Apr-14 22:00 5:59 NIGHT
1040 5-Apr-14 6:00 21:59 DAY
1040 5-Apr-14 22:00 5:59 NIGHT
1040 6-Apr-14 22:00 5:59 NIGHT
1040 7-Apr-14 6:00 21:59 DAY
1040 7-Apr-14 22:00 5:59 NIGHT
2003 1-Jul-13 6:00 21:59 DAY
2003 1-Jul-13 22:00 5:59 NIGHT
2003 2-Jul-13 22:00 5:59 NIGHT
2003 3-Jul-13 6:00 21:59 DAY
2003 3-Jul-13 22:00 5:59 NIGHT
2003 4-Jul-13 6:00 21:59 DAY
2003 4-Jul-13 22:00 5:59 NIGHT
2003 5-Jul-13 6:00 21:59 DAY
2010 2-Jun-13 6:01 21:59 DAY
2010 2-Jun-13 22:00 6:00 NIGHT
2010 16-Jul-13 6:01 21:59 DAY
2010 16-Jul-13 22:00 6:00 NIGHT
2010 17-Jul-13 6:01 21:59 DAY
2010 17-Jul-13 22:00 6:00 NIGHT
2010 24-Jul-13 6:01 21:59 DAY
2010 24-Jul-13 22:00 6:00 NIGHT
2010 25-Jul-13 22:00 6:00 NIGHT
2010 26-Jul-13 22:00 6:00 NIGHT
2010 28-Jul-13 22:00 6:00 NIGHT
2010 29-Jul-13 6:01 21:59 DAY
2010 29-Jul-13 22:00 6:00 NIGHT
2010 1-Aug-13 6:01 21:59 DAY
2010 9-Aug-13 6:01 21:59 DAY
2010 9-Aug-13 22:00 6:00 NIGHT
2010 10-Aug-13 6:01 21:59 DAY
2010 10-Aug-13 22:00 6:00 NIGHT
2010 11-Aug-13 22:00 6:00 NIGHT
2010 12-Aug-13 22:00 6:00 NIGHT
2010 1-Mar-14 22:00 5:59 NIGHT
2010 2-Mar-14 22:00 5:59 NIGHT
2010 3-Mar-14 22:00 5:59 NIGHT
2010 5-Mar-14 6:00 21:59 DAY
2010 5-Mar-14 22:00 5:59 NIGHT
2010 6-Mar-14 6:00 21:59 DAY
2010 6-Mar-14 22:00 5:59 NIGHT
run;
Output dataset with addition of 2 new variables added (Action, Pair).
Action | Pair | ID | Start Date | Start Time | End Time | Value |
KEEP | 1 | 1024 | 22-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1024 | 22-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 1024 | 23-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 1024 | 23-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 3 | 1024 | 24-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 3 | 1024 | 24-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 3-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 3-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 4-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 5-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 5-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 6-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 7-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 7-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 1-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 1-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 2-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 3-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 3-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2003 | 4-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2003 | 4-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 5-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 16-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 16-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 17-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 17-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 25-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 26-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 28-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 29-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 29-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 10-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 10-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 11-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 12-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 2-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 3-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2010 | 5-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 5-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2010 | 6-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2010 | 6-Mar-14 | 22:00 | 5:59 | NIGHT |
The code I originally suggested assumed that you actually wanted to delete those records. The following does what you are currently asking for:
data want (drop=_:); set have; by id; length action $6; retain _next_keep pair; if first.id then pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; pair+1; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else if _next_keep eq 'KEEP' then do; action='KEEP'; output; _next_keep='DELETE'; end; run;
HTH,
Art, CEO, AnalystFinder.com
Thanks for the quick reply. Your adjusted code achieves all my goals except reset pair count to 1 if there are non-contiguous date records. See records in bold red in below example output dataset.
Action | Pair | ID | Start Date | Start Time | End Time | Value |
KEEP | 1 | 1024 | 22-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1024 | 22-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 1024 | 23-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 1024 | 23-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 3 | 1024 | 24-Feb-14 | 6:00 | 21:59 | DAY |
KEEP | 3 | 1024 | 24-Feb-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 3-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 3-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 4-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 5-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 5-Apr-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 1040 | 6-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 1040 | 7-Apr-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 1040 | 7-Apr-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 1-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 1-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 2-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2003 | 3-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2003 | 3-Jul-13 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2003 | 4-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2003 | 4-Jul-13 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2003 | 5-Jul-13 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 2-Jun-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 16-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 16-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 17-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 17-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 25-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 26-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 28-Jul-13 | 22:00 | 6:00 | NIGHT |
KEEP | 1 | 2010 | 29-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 29-Jul-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 1 | 2010 | 9-Aug-13 | 22:00 | 6:00 | NIGHT |
KEEP | 2 | 2010 | 10-Aug-13 | 6:01 | 21:59 | DAY |
KEEP | 2 | 2010 | 10-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 11-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 12-Aug-13 | 22:00 | 6:00 | NIGHT |
DELETE | . | 2010 | 1-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 2-Mar-14 | 22:00 | 5:59 | NIGHT |
DELETE | . | 2010 | 3-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 1 | 2010 | 5-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 1 | 2010 | 5-Mar-14 | 22:00 | 5:59 | NIGHT |
KEEP | 2 | 2010 | 6-Mar-14 | 6:00 | 21:59 | DAY |
KEEP | 2 | 2010 | 6-Mar-14 | 22:00 | 5:59 | NIGHT |
I question your pair assignment for the following two records:
KEEP | 3 | 2010 | 24-Jul-13 | 6:01 | 21:59 | DAY |
KEEP | 3 | 2010 | 24-Jul-13 | 22:00 | 6:00 | NIGHT |
according to your rules, I'd think they should be 1s.
If so, the following should work:
data want (drop=_:); set have; by id; length action $6; retain _next_keep _last_kept_date _pair; if first.id then _pair=0; if not last.id then do; _next=_n_+1; set have (keep=start_date value rename=(start_date=_next_start value=_next_value)) point=_next; if _next_keep='KEEP' then do; action='KEEP'; _last_kept_date=start_date; pair=_pair; output; _next_keep='DELETE'; end; else if start_date eq _next_start and value eq 'DAY' and _next_value eq 'NIGHT' then do; if 0<=(start_date-_last_kept_date)<=1 then _pair+1; else _pair=1; pair=_pair; action='KEEP'; output; _next_keep='KEEP'; end; else do; action='DELETE'; output; _next_keep='DELETE'; end; end; else if _next_keep eq 'KEEP' then do; action='KEEP'; pair=_pair; output; _next_keep='DELETE'; end; run;
HTH,
Art, CEO, AnalystFinder.com
This code produces what you want, includng the reset to pair=1 when a pair is more than one day after the most preceding pair.
HOWEVER, it appears your sample answer is wrong for the 24-July-2013 pair, which you have assigned as pair=3. But the preceding pair is 1 week prior (17-JUL-13), so the 24-jul pair should be pair=1.
data want (drop=p nxt_val p_membr pair_date);
set have (keep=id start_date);
by id start_date;
retain p 0 /* to track pair number */
p_membr . /* 1=beginning of pair, 2=end of a pair */
pair_date /* Date of most recent pair */
if first.start_date=1 then do;
if first.id then p=0;
if pair_date < start_date-1 then p=0;
end;
merge have
have (firstobs=2 keep=id value rename=(value=nxt_val));
/* If retain p_membr=1, this one must be p_membr=2*/
if p_membr=1 then do;
pair=p;
p_membr=2;
end;
else if last.start_date=0 and catx('/',value,nxt_val) in ('DAY/NIGHT','NIGHT/DAY') then do;
p_membr=1;
pair_date=start_date;
p=p+1;
pair=p;
end;
else p_membr=.;
if p_membr=. then action='DELETE';
else action='KEEP';
run;
Aye, yes you are correct. Long day 😞
Your code does work very well although it renumbers the last pair id to the next id.. See example output dataset below...
ID | Start_Date | Start_Time | End_Time | Value | pair | action |
1024 | 22-Feb-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
1024 | 22-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
1024 | 23-Feb-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
1024 | 23-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
1024 | 24-Feb-14 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
1040 | 24-Feb-14 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
1040 | 3-Apr-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
1040 | 3-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
1040 | 4-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
1040 | 5-Apr-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
1040 | 5-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
1040 | 6-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
1040 | 7-Apr-14 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
2003 | 7-Apr-14 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
2003 | 1-Jul-13 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
2003 | 1-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
2003 | 2-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2003 | 3-Jul-13 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
2003 | 3-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
2003 | 4-Jul-13 | 6:00:00 | 21:59:00 | DAY | 3 | KEEP |
2003 | 4-Jul-13 | 22:00:00 | 5:59:00 | NIGHT | 3 | KEEP |
2010 | 5-Jul-13 | 6:00:00 | 21:59:00 | DAY | . | DELETE |
2010 | 2-Jun-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 2-Jun-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 16-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 16-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 17-Jul-13 | 6:01:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 17-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 2 | KEEP |
2010 | 24-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 24-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 25-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 26-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 28-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 29-Jul-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 29-Jul-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 1-Aug-13 | 6:01:00 | 21:59:00 | DAY | . | DELETE |
2010 | 9-Aug-13 | 6:01:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 9-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | 1 | KEEP |
2010 | 10-Aug-13 | 6:01:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 10-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | 2 | KEEP |
2010 | 11-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 12-Aug-13 | 22:00:00 | 6:00:00 | NIGHT | . | DELETE |
2010 | 1-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 2-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 3-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | . | DELETE |
2010 | 5-Mar-14 | 6:00:00 | 21:59:00 | DAY | 1 | KEEP |
2010 | 5-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | 1 | KEEP |
2010 | 6-Mar-14 | 6:00:00 | 21:59:00 | DAY | 2 | KEEP |
2010 | 6-Mar-14 | 22:00:00 | 5:59:00 | NIGHT | 2 | KEEP |
Did you try my revised code?
Art, CEO, AnalystFinder.com
Yes. It works except for the resetting of pair = 1 when non-contiguous date occurs.
Not the code I posted about a half hour ago .. I don't think!
Art, CEO, AnalystFinder.com
I think it's working. I'm double checking. I'm getting tired 😞
Please mark @art297's message with the working code as the solution, not his request for you to test it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.