BookmarkSubscribeRSS Feed
SrikanthY
Calcite | Level 5

output where there are dates interruption records with overlapping or duplicate dates.


subject having multi records should be within the range from starting and end date
for example for ID1: we need after 2nd row. becoz end date is greater than starting date after that
records are within the range of dates. and also need to o/p dulicates.

data set like below.

 

SubjectRowstdtedtOP 
ID113/6/20173/15/2017  
ID123/16/20173/28/2017output 
ID133/17/20173/17/2017output 
ID143/18/20173/22/2017output 
ID213/6/20173/15/2017  
ID313/6/20173/15/2017  
ID323/16/20173/28/2017  
ID413/16/20173/28/2017output 
ID423/15/20173/29/2017output 
ID433/17/20173/30/2017output 
ID443/18/20173/31/2017output 
ID513/16/2017 outputDuplicates
ID523/16/2017 outputDuplicates
ID613/16/20173/28/2017outputDuplicates
ID623/16/20173/28/2017outputDuplicates

Capture1.PNG
5 REPLIES 5
ballardw
Super User

I am not quite sure I understand your logic involving dates.

Why is this one in the output:

ID4 1 3/16/2017 3/28/2017 output

 

 

but this one is not?

ID3 1 3/6/2017 3/15/2017  

If there is a specific value of edt to compare you need to tell us what it may be.

 

SrikanthY
Calcite | Level 5
for ID3 starting date on 6th end date 15th.. after end date (15th) then
again starting date 16th. that between that there is no other dates
falling. so we dont want that records.

for ID4 starting date is 16th and end date is 30th. there are many dates
falling b/w those dates.

once the starting date is first and end date is last than starting date
should be greater that last end date.

say for example drug a given for subject x on 6th and end on 10th. and
again started on 12th and end on 18th.
if any records coming b/w on 14th and end on 16th. that means duplicate
records.. once started on 12 th and ending on 18th. wy again 14th and 16th
day records.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
mkeintz
PROC Star

I think this is what you are describing:

 

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

 

Is that your intention?

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

@mkeintz wrote:

I think this is what you are describing:

 

Form a date range using the min STDT and max EDT from a series of consecutive records.  Every record in the group must have at least a single day intersection with at least one other record in the group.  Records that are merely adjacent to a range are not included.

 

Is that your intention?

 

 

 


If my assumptions above are correct, then this program will provide what you need:

 

See revision comments below - (this program is incomplete).

 

data want;

  do N=1 by 1 until (last.subject or min(period_edt,nxt_stdt));
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt rename=(stdt=nxt_stdt));
	period_edt=max(period_edt,edt);
	if first.subject then period_edt=edt;

  end;
  if N>1 then OPx='output';
  do I=1 to N; 
    set have;
	by subject stdt edt notsorted;
	if first.edt=0 or last.edt=0 then comment='Duplicates'
	else comment='N';
    output;
  end;
run;

 

Notes:

  1. The program has two DO loops.  The first loops reads in a set of records (for a single "period"), and the 2nd re-reads the same set of records.
  2. The first DO N=1 by 1 until ... does the following
    1. If the record in hand overlaps with the prior cumulative period, then update the period_edt.
    2. The loop stops at the end of a subject, or if the current period_edt falls short of the stdt of the next record
  3. Then if N is >1 set "output" on
  4. The second DO rereads the same records.
    1. It has a "by subject stdt edt notsorted" to help detect when there are duplicate records

 

REvision:

  Besides a typo in the above program, I didn't program for a subsequent record overlapping the lower end of the current period.  Also a minor fix to account for missing edt:

 

data want;

  do N=1 by 1 until (last.subject or period_edt<nxt_stdt or period_stdt>coalesce(nxt_edt,nxt_stdt) );
    set have;
    by subject;
    merge have  have (firstobs=2 keep=stdt edt rename=(stdt=nxt_stdt edt=nxt_edt));
    period_edt=max(period_edt,edt,stdt);
    period_stdt=min(period_stdt,stdt);
    if first.subject then do;
      period_edt=edt;
      period_stdt=stdt;
    end;
  end;

  if N>1 then OPx='output';

  do I=1 to N; 
    set have;
      by subject stdt edt notsorted;
      if first.edt=0 or last.edt=0 then comment='Duplicates';
      else comment=' ';
    output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep.  As such I will just give some theory answers here:

To get duplicates, you can use:

proc sort ... nodupkey dupout=duplicates; 
  by ...; 
run;

The by group identifies the duplciates.

As for your other question on outputting several of the rows, sorry, I don't get the logic please calrify.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 769 views
  • 1 like
  • 4 in conversation