DATA Step, Macro, Functions and more

dates within the range by subjects

Reply
Contributor
Posts: 24

dates within the range by subjects

output where there are dates interruption records with overlapping or duplicate dates.


subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.

data set like below.

 

SubjectRowstdtedtOP 
ID113/6/20173/15/2017  
ID123/16/20173/28/2017output 
ID133/17/20173/17/2017output 
ID143/18/20173/22/2017output 
ID213/6/20173/15/2017  
ID313/6/20173/15/2017  
ID323/16/20173/28/2017  
ID413/16/20173/28/2017output 
ID423/15/20173/29/2017output 
ID433/17/20173/30/2017output 
ID443/18/20173/31/2017output 
ID513/16/2017 outputDuplicates
ID523/16/2017 outputDuplicates
ID613/16/20173/28/2017outputDuplicates
ID623/16/20173/28/2017outputDuplicates

Capture1.PNG
Super User
Posts: 11,343

Re: dates within the range by subjects

Posted in reply to SrikanthY

I am not quite sure I understand your logic involving dates.

Why is this one in the output:

ID4 1 3/16/2017 3/28/2017 output

 

 

but this one is not?

ID3 1 3/6/2017 3/15/2017  

If there is a specific value of edt to compare you need to tell us what it may be.

 

Contributor
Posts: 24

Re: dates within the range by subjects

for ID3 starting date on 6th end date 15th.. after end date (15th) then
again starting date 16th. that between that there is no other dates
falling. so we dont want that records.

for ID4 starting date is 16th and end date is 30th. there are many dates
falling b/w those dates.

once the starting date is first and end date is last than starting date
should be greater that last end date.

say for example drug a given for subject x on 6th and end on 10th. and
again started on 12th and end on 18th.
if any records coming b/w on 14th and end on 16th. that means duplicate
records.. once started on 12 th and ending on 18th. wy again 14th and 16th
day records.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Trusted Advisor
Posts: 1,022

Re: dates within the range by subjects

Posted in reply to SrikanthY

I think this is what you are describing:

 

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

 

Is that your intention?

 

 

 

Trusted Advisor
Posts: 1,022

Re: dates within the range by subjects

[ Edited ]

mkeintz wrote:

I think this is what you are describing:

 

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

 

Is that your intention?

 

 

 


If my assumptions above are correct, then this program will provide what you need:

 

See revision comments below - (this program is incomplete).

 

data want;

  do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
	period_edt=max(period_edt,edt);
	if first.subject then period_edt=edt;

  end;
  if N>1 then OPx='output';
  do I=1 to N; 
    set have;
	by subject stdt edt notsorted;
	if first.edt=0 or last.edt=0 then comment='Duplicates'
	else comment='N';
    output;
  end;
run;

 

Notes:

  1. The program has two DO loops.  The first loops reads in a set of records (for a single "period"), and the 2nd re-reads the same set of records.
  2. The first DO N=1 by 1 until ... does the following
    1. If the record in hand overlaps with the prior cumulative period, then update the period_edt.
    2. The loop stops at the end of a subject, or if the current period_edt falls short of the stdt of the next record
  3. Then if N is >1 set "output" on
  4. The second DO rereads the same records.
    1. It has a "by subject stdt edt notsorted" to help detect when there are duplicate records

 

REvision:

  Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period.  Also a minor fix to account for missing edt:

 

data want;

  do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
    period_edt=max(period_edt,edt,stdt);
    period_stdt=min(period_stdt,stdt);
    if first.subject then do;
      period_edt=edt;
      period_stdt=stdt;
    end;
  end;

  if N>1 then OPx='output';

  do I=1 to N; 
    set have;
      by subject stdt edt notsorted;
      if first.edt=0 or last.edt=0 then comment='Duplicates';
      else comment=' ';
    output;
  end;
run;
Super User
Super User
Posts: 7,970

Re: dates within the range by subjects

Posted in reply to SrikanthY

Post test data in the form of a datastep.  As such I will just give some theory answers here:

To get duplicates, you can use:

proc sort ... nodupkey dupout=duplicates; 
  by ...; 
run;

The by group identifies the duplciates.

As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.

Ask a Question
Discussion stats
  • 5 replies
  • 125 views
  • 1 like
  • 4 in conversation