BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smunigala
Obsidian | Level 7

Hi all,

I have a dataset looking at orthopedic surgery procedures over time. Date of procedure is recorded as DATETIME20. I have some duplicate entries for same procedure on same date where some entries show datetime exactly (for eg. 10OCT2016 04:25:10) and some show just the date and 0 for time (for eg. 10OCT2016 00:00:00). FOr those with duplicate entries, I need to pick up the ones with exact date time (10OCT2016 04:25:10) and delete one with no time on it (10OCT2016 00:00:00). 

 

However, I have some observations with no duplicate entries on same date, but with date and 0 for time which I need to retain (for eg. 11OCT2016 00:00:00).

 

I would appreciate if I can get help in removing duplicate entries for same day with 0 time value but retaining unique observations with date and 0 time value.

 

ID variable: Reg_No

Date of Surgery: PROCEDURE_DATE (datetime 20.)

 

Here is the sample data from my study. I just want to remove the one I marked DELETE.

 

Reg_No     PROCEDURE_DATE

1001          11OCT2016 00:00:00       --Retain

1001          12OCT2016 00:00:00      --Delete

1001          12OCT2016 01:20:01      --Retain

1001          13OCT2016 00:00:00      --Retain

1001          14OCT2016 02:30:10      --Retain

1002          13OCT2016 00:00:00      --Retain

1002          14OCT2016 22:01:00      --Retain

1002          15OCT2016 00:00:00      --Retain

1003          15OCT2016 00:00:00      --Retain

 

 

 

Thank you!

Sat

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

The below solution supports the situation where you have 2+ records for the same Reg_no and same day.

It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.

 

data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001          11OCT2016 00:00:00
1001          12OCT2016 00:00:00
1001          12OCT2016 01:20:01
1001          13OCT2016 00:00:00
1001          14OCT2016 02:30:10
1002          13OCT2016 00:00:00
1002          14OCT2016 22:01:00
1002          15OCT2016 00:00:00
1003          15OCT2016 00:00:00
;
run;

proc sort data=work.smunigala NODUPKEY;
 by Reg_No procedure_date_dp procedure_date_tp;
run;

data work.smunigala1;
 set work.smunigala;
 by Reg_No procedure_date_dp procedure_date_tp;
 if first.procedure_date_dp then do;
  if     first.procedure_date_dp NE last.procedure_date_dp
     AND procedure_date_tp = '00:00:00't
   then delete;
  end;
run;
/* end of program */

 

Regards,

Koen

 

View solution in original post

7 REPLIES 7
ballardw
Super User

If the data is sorted by Reg_no and procedure_date then:

 

data temp;

    set have;

    dateonly=datepart (procedure_date);

run;

 

data want;

   set temp;

   by reg_no dateonly;

   if last.dateonly;

run;

 

seems to do what you request.

  

smunigala
Obsidian | Level 7

Hi,

Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.

 

Thanks,

Satish

ballardw
Super User

@smunigala wrote:

Hi,

Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.

 

Thanks,

Satish


Did you try the code?

Do you have records with more than 2 of the same date?

sbxkoenk
SAS Super FREQ

The below solution supports the situation where you have 2+ records for the same Reg_no and same day.

It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.

 

data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001          11OCT2016 00:00:00
1001          12OCT2016 00:00:00
1001          12OCT2016 01:20:01
1001          13OCT2016 00:00:00
1001          14OCT2016 02:30:10
1002          13OCT2016 00:00:00
1002          14OCT2016 22:01:00
1002          15OCT2016 00:00:00
1003          15OCT2016 00:00:00
;
run;

proc sort data=work.smunigala NODUPKEY;
 by Reg_No procedure_date_dp procedure_date_tp;
run;

data work.smunigala1;
 set work.smunigala;
 by Reg_No procedure_date_dp procedure_date_tp;
 if first.procedure_date_dp then do;
  if     first.procedure_date_dp NE last.procedure_date_dp
     AND procedure_date_tp = '00:00:00't
   then delete;
  end;
run;
/* end of program */

 

Regards,

Koen

 

sbxkoenk
SAS Super FREQ
Quick addendum: if there is only one record for a Reg_no & day combination, it is kept in the output in any case (even if the time was 00:00:00).
Koen
jhlaramore
SAS Employee
proc sql;
create table NEWDATASETNAME(drop=count_flag time_flag) as
select *
, case when count(reg_no)>1 then 1 else 0 end as count_flag
, case when timepart(procedure_date)='00:00:00't then 1 else 0 end as time_flag
from CURRENTDATASETNAME
group by reg_no,datepart(procedure_date)
having calculated count_flag + calculated time_flag < 2;
quit;

 

smunigala
Obsidian | Level 7

Thank you all for helping me with this!

Sat

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3126 views
  • 2 likes
  • 4 in conversation