Hi all,
I have a dataset looking at orthopedic surgery procedures over time. Date of procedure is recorded as DATETIME20. I have some duplicate entries for same procedure on same date where some entries show datetime exactly (for eg. 10OCT2016 04:25:10) and some show just the date and 0 for time (for eg. 10OCT2016 00:00:00). FOr those with duplicate entries, I need to pick up the ones with exact date time (10OCT2016 04:25:10) and delete one with no time on it (10OCT2016 00:00:00).
However, I have some observations with no duplicate entries on same date, but with date and 0 for time which I need to retain (for eg. 11OCT2016 00:00:00).
I would appreciate if I can get help in removing duplicate entries for same day with 0 time value but retaining unique observations with date and 0 time value.
ID variable: Reg_No
Date of Surgery: PROCEDURE_DATE (datetime 20.)
Here is the sample data from my study. I just want to remove the one I marked DELETE.
Reg_No PROCEDURE_DATE
1001 11OCT2016 00:00:00 --Retain
1001 12OCT2016 00:00:00 --Delete
1001 12OCT2016 01:20:01 --Retain
1001 13OCT2016 00:00:00 --Retain
1001 14OCT2016 02:30:10 --Retain
1002 13OCT2016 00:00:00 --Retain
1002 14OCT2016 22:01:00 --Retain
1002 15OCT2016 00:00:00 --Retain
1003 15OCT2016 00:00:00 --Retain
Thank you!
Sat
The below solution supports the situation where you have 2+ records for the same Reg_no and same day.
It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.
data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001 11OCT2016 00:00:00
1001 12OCT2016 00:00:00
1001 12OCT2016 01:20:01
1001 13OCT2016 00:00:00
1001 14OCT2016 02:30:10
1002 13OCT2016 00:00:00
1002 14OCT2016 22:01:00
1002 15OCT2016 00:00:00
1003 15OCT2016 00:00:00
;
run;
proc sort data=work.smunigala NODUPKEY;
by Reg_No procedure_date_dp procedure_date_tp;
run;
data work.smunigala1;
set work.smunigala;
by Reg_No procedure_date_dp procedure_date_tp;
if first.procedure_date_dp then do;
if first.procedure_date_dp NE last.procedure_date_dp
AND procedure_date_tp = '00:00:00't
then delete;
end;
run;
/* end of program */
Regards,
Koen
If the data is sorted by Reg_no and procedure_date then:
data temp;
set have;
dateonly=datepart (procedure_date);
run;
data want;
set temp;
by reg_no dateonly;
if last.dateonly;
run;
seems to do what you request.
Hi,
Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.
Thanks,
Satish
@smunigala wrote:
Hi,
Thanks for the quick reply. However, I am looking to remove duplicates only for the same day and not for different days. Can you check the sample data I specified? Each Reg_No has unique procedure dates (with and without time) which I want to retain. Only if a Reg_No has duplicate entries on same day, I need to pick up the procedure dates qith specified time.
Thanks,
Satish
Did you try the code?
Do you have records with more than 2 of the same date?
The below solution supports the situation where you have 2+ records for the same Reg_no and same day.
It will throw away the first occurrence if and only if the time = 00:00:00 while keeping all other records for that same Reg_no and same day.
data work.smunigala;
input Reg_No PROCEDURE_DATE_Char $ 15-32;
PROCEDURE_DATE=input(PROCEDURE_DATE_Char,datetime20.);
procedure_date_dp=datepart(PROCEDURE_DATE);
procedure_date_tp=timepart(PROCEDURE_DATE);
format PROCEDURE_DATE datetime20. procedure_date_dp date9. procedure_date_tp time8.;
datalines;
1001 11OCT2016 00:00:00
1001 12OCT2016 00:00:00
1001 12OCT2016 01:20:01
1001 13OCT2016 00:00:00
1001 14OCT2016 02:30:10
1002 13OCT2016 00:00:00
1002 14OCT2016 22:01:00
1002 15OCT2016 00:00:00
1003 15OCT2016 00:00:00
;
run;
proc sort data=work.smunigala NODUPKEY;
by Reg_No procedure_date_dp procedure_date_tp;
run;
data work.smunigala1;
set work.smunigala;
by Reg_No procedure_date_dp procedure_date_tp;
if first.procedure_date_dp then do;
if first.procedure_date_dp NE last.procedure_date_dp
AND procedure_date_tp = '00:00:00't
then delete;
end;
run;
/* end of program */
Regards,
Koen
proc sql;
create table NEWDATASETNAME(drop=count_flag time_flag) as
select *
, case when count(reg_no)>1 then 1 else 0 end as count_flag
, case when timepart(procedure_date)='00:00:00't then 1 else 0 end as time_flag
from CURRENTDATASETNAME
group by reg_no,datepart(procedure_date)
having calculated count_flag + calculated time_flag < 2;
quit;
Thank you all for helping me with this!
Sat
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.