Hi all,
I have a dataset looking at orthopedic surgery procedures over time. Date of procedure is recorded as DATETIME20. I have some duplicate entries for same procedure on same date where some entries show datetime exactly (for eg. 10OCT2016 04:25:10) and some show just the date and 0 for time (for eg. 10OCT2016 00:00:00). FOr those with duplicate entries, I need to pick up the ones with exact date time (10OCT2016 04:25:10) and delete one with no time on it (10OCT2016 00:00:00).
However, I have some observations with no duplicate entries on same date, but with date and 0 for time which I need to retain (for eg. 11OCT2016 00:00:00).
I would appreciate if I can get help in removing duplicate entries for same day with 0 time value but retaining unique observations with date and 0 time value.
ID variable: Reg_No
Date of Surgery: PROCEDURE_DATE (datetime 20.)
Here is the sample data from my study. I just want to remove the one I marked DELETE.
Reg_No PROCEDURE_DATE
1001 11OCT2016 00:00:00 --Retain
1001 12OCT2016 00:00:00 --Delete
1001 12OCT2016 01:20:01 --Retain
1001 13OCT2016 00:00:00 --Retain
1001 14OCT2016 02:30:10 --Retain
1002 13OCT2016 00:00:00 --Retain
1002 14OCT2016 22:01:00 --Retain
1002 15OCT2016 00:00:00 --Retain
1003 15OCT2016 00:00:00 --Retain
Thank you!
Sat
The below solution supports the situation where you have 2+ records for the same Reg_no and same day.
It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.
data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001 11OCT2016 00:00:00
1001 12OCT2016 00:00:00
1001 12OCT2016 01:20:01
1001 13OCT2016 00:00:00
1001 14OCT2016 02:30:10
1002 13OCT2016 00:00:00
1002 14OCT2016 22:01:00
1002 15OCT2016 00:00:00
1003 15OCT2016 00:00:00
;
run;
proc sort data=work.smunigala NODUPKEY;
by Reg_No procedure_date_dp procedure_date_tp;
run;
data work.smunigala1;
set work.smunigala;
by Reg_No procedure_date_dp procedure_date_tp;
if first.procedure_date_dp then do;
if first.procedure_date_dp NE last.procedure_date_dp
AND procedure_date_tp = '00:00:00't
then delete;
end;
run;
/* end of program */
Regards,
Koen
If the data is sorted by Reg_no and procedure_date then:
data temp;
set have;
dateonly=datepart (procedure_date);
run;
data want;
set temp;
by reg_no dateonly;
if last.dateonly;
run;
seems to do what you request.
Hi,
Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.
Thanks,
Satish
@smunigala wrote:
Hi,
Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.
Thanks,
Satish
Did you try the code?
Do you have records with more than 2 of the same date?
The below solution supports the situation where you have 2+ records for the same Reg_no and same day.
It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.
data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001 11OCT2016 00:00:00
1001 12OCT2016 00:00:00
1001 12OCT2016 01:20:01
1001 13OCT2016 00:00:00
1001 14OCT2016 02:30:10
1002 13OCT2016 00:00:00
1002 14OCT2016 22:01:00
1002 15OCT2016 00:00:00
1003 15OCT2016 00:00:00
;
run;
proc sort data=work.smunigala NODUPKEY;
by Reg_No procedure_date_dp procedure_date_tp;
run;
data work.smunigala1;
set work.smunigala;
by Reg_No procedure_date_dp procedure_date_tp;
if first.procedure_date_dp then do;
if first.procedure_date_dp NE last.procedure_date_dp
AND procedure_date_tp = '00:00:00't
then delete;
end;
run;
/* end of program */
Regards,
Koen
proc sql;
create table NEWDATASETNAME(drop=count_flag time_flag) as
select *
, case when count(reg_no)>1 then 1 else 0 end as count_flag
, case when timepart(procedure_date)='00:00:00't then 1 else 0 end as time_flag
from CURRENTDATASETNAME
group by reg_no,datepart(procedure_date)
having calculated count_flag + calculated time_flag < 2;
quit;
Thank you all for helping me with this!
Sat
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.