output where there are dates interruption records with overlapping or duplicate dates.
subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.
data set like below.
Subject | Row | stdt | edt | OP | |
ID1 | 1 | 3/6/2017 | 3/15/2017 | ||
ID1 | 2 | 3/16/2017 | 3/28/2017 | output | |
ID1 | 3 | 3/17/2017 | 3/17/2017 | output | |
ID1 | 4 | 3/18/2017 | 3/22/2017 | output | |
ID2 | 1 | 3/6/2017 | 3/15/2017 | ||
ID3 | 1 | 3/6/2017 | 3/15/2017 | ||
ID3 | 2 | 3/16/2017 | 3/28/2017 | ||
ID4 | 1 | 3/16/2017 | 3/28/2017 | output | |
ID4 | 2 | 3/15/2017 | 3/29/2017 | output | |
ID4 | 3 | 3/17/2017 | 3/30/2017 | output | |
ID4 | 4 | 3/18/2017 | 3/31/2017 | output | |
ID5 | 1 | 3/16/2017 | output | Duplicates | |
ID5 | 2 | 3/16/2017 | output | Duplicates | |
ID6 | 1 | 3/16/2017 | 3/28/2017 | output | Duplicates |
ID6 | 2 | 3/16/2017 | 3/28/2017 | output | Duplicates |
I am not quite sure I understand your logic involving dates.
Why is this one in the output:
ID4 | 1 | 3/16/2017 | 3/28/2017 | output |
|
but this one is not?
ID3 | 1 | 3/6/2017 | 3/15/2017 |
If there is a specific value of edt to compare you need to tell us what it may be.
I think this is what you are describing:
Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.
Is that your intention?
@mkeintz wrote:
I think this is what you are describing:
Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.
Is that your intention?
If my assumptions above are correct, then this program will provide what you need:
See revision comments below - (this program is incomplete).
data want;
do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
set have;
by subject;
merge have have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
period_edt=max(period_edt,edt);
if first.subject then period_edt=edt;
end;
if N>1 then OPx='output';
do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates'
else comment='N';
output;
end;
run;
Notes:
REvision:
Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period. Also a minor fix to account for missing edt:
data want;
do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
set have;
by subject;
merge have have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
period_edt=max(period_edt,edt,stdt);
period_stdt=min(period_stdt,stdt);
if first.subject then do;
period_edt=edt;
period_stdt=stdt;
end;
end;
if N>1 then OPx='output';
do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates';
else comment=' ';
output;
end;
run;
Post test data in the form of a datastep. As such I will just give some theory answers here:
To get duplicates, you can use:
proc sort ... nodupkey dupout=duplicates; by ...; run;
The by group identifies the duplciates.
As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.