Contributor
Posts: 25

# dates within the range by subjects

output where there are dates interruption records with overlapping or duplicate dates.

subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.

data set like below.

 Subject Row stdt edt OP ID1 1 3/6/2017 3/15/2017 ID1 2 3/16/2017 3/28/2017 output ID1 3 3/17/2017 3/17/2017 output ID1 4 3/18/2017 3/22/2017 output ID2 1 3/6/2017 3/15/2017 ID3 1 3/6/2017 3/15/2017 ID3 2 3/16/2017 3/28/2017 ID4 1 3/16/2017 3/28/2017 output ID4 2 3/15/2017 3/29/2017 output ID4 3 3/17/2017 3/30/2017 output ID4 4 3/18/2017 3/31/2017 output ID5 1 3/16/2017 output Duplicates ID5 2 3/16/2017 output Duplicates ID6 1 3/16/2017 3/28/2017 output Duplicates ID6 2 3/16/2017 3/28/2017 output Duplicates

Super User
Posts: 13,939

## Re: dates within the range by subjects

I am not quite sure I understand your logic involving dates.

Why is this one in the output:

 ID4 1 3/16/2017 3/28/2017 output

but this one is not?

 ID3 1 3/6/2017 3/15/2017

If there is a specific value of edt to compare you need to tell us what it may be.

Contributor
Posts: 25

## Re: dates within the range by subjects

for ID3 starting date on 6th end date 15th.. after end date (15th) then
again starting date 16th. that between that there is no other dates
falling. so we dont want that records.

for ID4 starting date is 16th and end date is 30th. there are many dates
falling b/w those dates.

once the starting date is first and end date is last than starting date
should be greater that last end date.

say for example drug a given for subject x on 6th and end on 10th. and
again started on 12th and end on 18th.
if any records coming b/w on 14th and end on 16th. that means duplicate
records.. once started on 12 th and ending on 18th. wy again 14th and 16th
day records.

attachments. -##
Posts: 1,389

## Re: dates within the range by subjects

I think this is what you are describing:

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

Posts: 1,389

## Re: dates within the range by subjects

[ Edited ]

mkeintz wrote:

I think this is what you are describing:

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

If my assumptions above are correct, then this program will provide what you need:

See revision comments below - (this program is incomplete).

``````data want;

do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
set have;
by subject;
merge have  have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
period_edt=max(period_edt,edt);
if first.subject then period_edt=edt;

end;
if N>1 then OPx='output';
do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates'
else comment='N';
output;
end;
run;
``````

Notes:

1. The program has two DO loops.  The first loops reads in a set of records (for a single "period"), and the 2nd re-reads the same set of records.
2. The first DO N=1 by 1 until ... does the following
1. If the record in hand overlaps with the prior cumulative period, then update the period_edt.
2. The loop stops at the end of a subject, or if the current period_edt falls short of the stdt of the next record
3. Then if N is >1 set "output" on
4. The second DO rereads the same records.
1. It has a "by subject stdt edt notsorted" to help detect when there are duplicate records

REvision:

Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period.  Also a minor fix to account for missing edt:

``````data want;

do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
set have;
by subject;
merge have  have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
period_edt=max(period_edt,edt,stdt);
period_stdt=min(period_stdt,stdt);
if first.subject then do;
period_edt=edt;
period_stdt=stdt;
end;
end;

if N>1 then OPx='output';

do I=1 to N;
set have;
by subject stdt edt notsorted;
if first.edt=0 or last.edt=0 then comment='Duplicates';
else comment=' ';
output;
end;
run;
``````
Super User
Posts: 9,840

## Re: dates within the range by subjects

Post test data in the form of a datastep.  As such I will just give some theory answers here:

To get duplicates, you can use:

```proc sort ... nodupkey dupout=duplicates;
by ...;
run;```

The by group identifies the duplciates.

As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.

Discussion stats
• 5 replies
• 148 views
• 1 like
• 4 in conversation