BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smunigala
Obsidian | Level 7

Hi all,

I have a dataset looking at orthopedic surgery procedures over time. Date of procedure is recorded as DATETIME20. I have some duplicate entries for same procedure on same date where some entries show datetime exactly (for eg. 10OCT2016 04:25:10) and some show just the date and 0 for time (for eg. 10OCT2016 00:00:00). FOr those with duplicate entries, I need to pick up the ones with exact date time (10OCT2016 04:25:10) and delete one with no time on it (10OCT2016 00:00:00). 

 

However, I have some observations with no duplicate entries on same date, but with date and 0 for time which I need to retain (for eg. 11OCT2016 00:00:00).

 

I would appreciate if I can get help in removing duplicate entries for same day with 0 time value but retaining unique observations with date and 0 time value.

 

ID variable: Reg_No

Date of Surgery: PROCEDURE_DATE (datetime 20.)

 

Here is the sample data from my study. I just want to remove the one I marked DELETE.

 

Reg_No     PROCEDURE_DATE

1001          11OCT2016 00:00:00       --Retain

1001          12OCT2016 00:00:00      --Delete

1001          12OCT2016 01:20:01      --Retain

1001          13OCT2016 00:00:00      --Retain

1001          14OCT2016 02:30:10      --Retain

1002          13OCT2016 00:00:00      --Retain

1002          14OCT2016 22:01:00      --Retain

1002          15OCT2016 00:00:00      --Retain

1003          15OCT2016 00:00:00      --Retain

 

 

 

Thank you!

Sat

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

The below solution supports the situation where you have 2+ records for the same Reg_no and same day.

It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.

 

data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001          11OCT2016 00:00:00
1001          12OCT2016 00:00:00
1001          12OCT2016 01:20:01
1001          13OCT2016 00:00:00
1001          14OCT2016 02:30:10
1002          13OCT2016 00:00:00
1002          14OCT2016 22:01:00
1002          15OCT2016 00:00:00
1003          15OCT2016 00:00:00
;
run;

proc sort data=work.smunigala NODUPKEY;
 by Reg_No procedure_date_dp procedure_date_tp;
run;

data work.smunigala1;
 set work.smunigala;
 by Reg_No procedure_date_dp procedure_date_tp;
 if first.procedure_date_dp then do;
  if     first.procedure_date_dp NE last.procedure_date_dp
     AND procedure_date_tp = '00:00:00't
   then delete;
  end;
run;
/* end of program */

 

Regards,

Koen

 

View solution in original post

7 REPLIES 7
ballardw
Super User

If the data is sorted by Reg_no and procedure_date then:

 

data temp;

    set have;

    dateonly=datepart (procedure_date);

run;

 

data want;

   set temp;

   by reg_no dateonly;

   if last.dateonly;

run;

 

seems to do what you request.

  

smunigala
Obsidian | Level 7

Hi,

Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.

 

Thanks,

Satish

ballardw
Super User

@smunigala wrote:

Hi,

Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.

 

Thanks,

Satish


Did you try the code?

Do you have records with more than 2 of the same date?

sbxkoenk
SAS Super FREQ

The below solution supports the situation where you have 2+ records for the same Reg_no and same day.

It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.

 

data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001          11OCT2016 00:00:00
1001          12OCT2016 00:00:00
1001          12OCT2016 01:20:01
1001          13OCT2016 00:00:00
1001          14OCT2016 02:30:10
1002          13OCT2016 00:00:00
1002          14OCT2016 22:01:00
1002          15OCT2016 00:00:00
1003          15OCT2016 00:00:00
;
run;

proc sort data=work.smunigala NODUPKEY;
 by Reg_No procedure_date_dp procedure_date_tp;
run;

data work.smunigala1;
 set work.smunigala;
 by Reg_No procedure_date_dp procedure_date_tp;
 if first.procedure_date_dp then do;
  if     first.procedure_date_dp NE last.procedure_date_dp
     AND procedure_date_tp = '00:00:00't
   then delete;
  end;
run;
/* end of program */

 

Regards,

Koen

 

sbxkoenk
SAS Super FREQ
Quick addendum: if there is only one record for a Reg_no & day combination, it is kept in the output in any case (even if the time was 00:00:00).
Koen
jhlaramore
SAS Employee
proc sql;
create table NEWDATASETNAME(drop=count_flag time_flag) as
select *
, case when count(reg_no)>1 then 1 else 0 end as count_flag
, case when timepart(procedure_date)='00:00:00't then 1 else 0 end as time_flag
from CURRENTDATASETNAME
group by reg_no,datepart(procedure_date)
having calculated count_flag + calculated time_flag < 2;
quit;

 

smunigala
Obsidian | Level 7

Thank you all for helping me with this!

Sat

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2520 views
  • 2 likes
  • 4 in conversation