Re: dates within the range by subjects

SrikanthY · Posted 03-15-2017 10:03 AM

output where there are dates interruption records with overlapping or duplicate dates.

subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.

data set like below.

Subject	Row	stdt	edt	OP
ID1	1	3/6/2017	3/15/2017
ID1	2	3/16/2017	3/28/2017	output
ID1	3	3/17/2017	3/17/2017	output
ID1	4	3/18/2017	3/22/2017	output
ID2	1	3/6/2017	3/15/2017
ID3	1	3/6/2017	3/15/2017
ID3	2	3/16/2017	3/28/2017
ID4	1	3/16/2017	3/28/2017	output
ID4	2	3/15/2017	3/29/2017	output
ID4	3	3/17/2017	3/30/2017	output
ID4	4	3/18/2017	3/31/2017	output
ID5	1	3/16/2017		output	Duplicates
ID5	2	3/16/2017		output	Duplicates
ID6	1	3/16/2017	3/28/2017	output	Duplicates
ID6	2	3/16/2017	3/28/2017	output	Duplicates

ballardw · Posted 03-15-2017 10:39 AM

I am not quite sure I understand your logic involving dates.

Why is this one in the output:

ID4

1

3/16/2017

3/28/2017

output

but this one is not?

ID3

1

3/6/2017

3/15/2017

If there is a specific value of edt to compare you need to tell us what it may be.

SrikanthY · Posted 03-15-2017 10:52 AM

for ID3 starting date on 6th end date 15th.. after end date (15th) then
again starting date 16th. that between that there is no other dates
falling. so we dont want that records.

for ID4 starting date is 16th and end date is 30th. there are many dates
falling b/w those dates.

once the starting date is first and end date is last than starting date
should be greater that last end date.

say for example drug a given for subject x on 6th and end on 10th. and
again started on 12th and end on 18th.
if any records coming b/w on 14th and end on 16th. that means duplicate
records.. once started on 12 th and ending on 18th. wy again 14th and 16th
day records.

##- Please type your reply above this line. Simple formatting, no
attachments. -##

mkeintz · Posted 03-15-2017 12:54 PM

I think this is what you are describing:

Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.

Is that your intention?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 03-15-2017 03:00 PM

@mkeintz wrote:

I think this is what you are describing:

Form a date range using the min STDT and max EDT from a series of consecutive records. Every record in the group must have at least a single day intersection with at least one other record in the group. Records that are merely adjacent to a range are not included.

Is that your intention?

If my assumptions above are correct, then this program will provide what you need:

See revision comments below - (this program is incomplete).

data want;

  do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
	period_edt=max(period_edt,edt);
	if first.subject then period_edt=edt;

  end;
  if N>1 then OPx='output';
  do I=1 to N; 
    set have;
	by subject stdt edt notsorted;
	if first.edt=0 or last.edt=0 then comment='Duplicates'
	else comment='N';
    output;
  end;
run;

Notes:

The program has two DO loops. The first loops reads in a set of records (for a single "period"), and the 2nd re-reads the same set of records.
The first DO N=1 by 1 until ... does the following
1. If the record in hand overlaps with the prior cumulative period, then update the period_edt.
2. The loop stops at the end of a subject, or if the current period_edt falls short of the stdt of the next record
Then if N is >1 set "output" on
The second DO rereads the same records.

It has a "by subject stdt edt notsorted" to help detect when there are duplicate records

REvision:

Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period. Also a minor fix to account for missing edt:

data want;

  do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
    period_edt=max(period_edt,edt,stdt);
    period_stdt=min(period_stdt,stdt);
    if first.subject then do;
      period_edt=edt;
      period_stdt=stdt;
    end;
  end;

  if N>1 then OPx='output';

  do I=1 to N; 
    set have;
      by subject stdt edt notsorted;
      if first.edt=0 or last.edt=0 then comment='Duplicates';
      else comment=' ';
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

RW9 · Posted 03-15-2017 10:39 AM

Post test data in the form of a datastep. As such I will just give some theory answers here:

To get duplicates, you can use:

proc sort ... nodupkey dupout=duplicates; 
  by ...; 
run;

The by group identifies the duplciates.

As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.

dates within the range by subjects