output where there are dates interruption records with overlapping or duplicate dates.
subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.
data set like below.
Subject | Row | stdt | edt | OP | |
ID1 | 1 | 3/6/2017 | 3/15/2017 | ||
ID1 | 2 | 3/16/2017 | 3/28/2017 | output | |
ID1 | 3 | 3/17/2017 | 3/17/2017 | output | |
ID1 | 4 | 3/18/2017 | 3/22/2017 | output | |
ID2 | 1 | 3/6/2017 | 3/15/2017 | ||
ID3 | 1 | 3/6/2017 | 3/15/2017 | ||
ID3 | 2 | 3/16/2017 | 3/28/2017 | ||
ID4 | 1 | 3/16/2017 | 3/28/2017 | output | |
ID4 | 2 | 3/15/2017 | 3/29/2017 | output | |
ID4 | 3 | 3/17/2017 | 3/30/2017 | output | |
ID4 | 4 | 3/18/2017 | 3/31/2017 | output | |
ID5 | 1 | 3/16/2017 | output | Duplicates | |
ID5 | 2 | 3/16/2017 | output | Duplicates | |
ID6 | 1 | 3/16/2017 | 3/28/2017 | output | Duplicates |
ID6 | 2 | 3/16/2017 | 3/28/2017 | output | Duplicates |
I am not quite sure I understand your logic involving dates.
Why is this one in the output:
ID4 | 1 | 3/16/2017 | 3/28/2017 | output |
|
but this one is not?
ID3 | 1 | 3/6/2017 | 3/15/2017 |
If there is a specific value of edt to compare you need to tell us what it may be.
I think this is what you are describing:
Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.
Is that your intention?
@mkeintz wrote:
I think this is what you are describing:
Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.
Is that your intention?
If my assumptions above are correct, then this program will provide what you need:
See revision comments below - (this program is incomplete).
data want;
do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
set have;
by subject;
merge have have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
period_edt=max(period_edt,edt);
if first.subject then period_edt=edt;
end;
if N>1 then OPx='output';
do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates'
else comment='N';
output;
end;
run;
Notes:
REvision:
Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period. Also a minor fix to account for missing edt:
data want;
do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
set have;
by subject;
merge have have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
period_edt=max(period_edt,edt,stdt);
period_stdt=min(period_stdt,stdt);
if first.subject then do;
period_edt=edt;
period_stdt=stdt;
end;
end;
if N>1 then OPx='output';
do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates';
else comment=' ';
output;
end;
run;
Post test data in the form of a datastep. As such I will just give some theory answers here:
To get duplicates, you can use:
proc sort ... nodupkey dupout=duplicates; by ...; run;
The by group identifies the duplciates.
As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.