Hi,
I was given a huge dataset containing 2 data fields (procedurestarttime & procedureendtime) with multiple time formats (example below). The time interval values are not always correct. Therefore, I want to calculate the time interval from start and end time and only use the inputted time_interval field only when everything else is missing.
I imagine I need a seires of IF THEN DO statements to separately strip the dates, pm, AM and insert ‘:’s when necessary followed by informats/formats. I’m not sure how to identify the different string types in order to apply a conversion.
I'd appreciate any suggestions regarding possible approaches. Thanks!
ProcedureStarttime ProcedureEndtime Time_interval
14:08 15:31 1:23
8:56:00 10:40:00 1:44:00
7:14:00 AM 8:14:00 AM 1:00:00
1/23/2018 11:14 1/23/2018 12:03 12:49:00 AM
2/8/2018 18:47 2/8/2018 19:50 1:03
10:00am 12:00pm 120
1350 1532 Not Reported
01-06-2008 09:35 01-06-2008 11:10 1:35
10:00am 12:00pm 120
1350 1532 Not Reported
01-06-2008 09:35 01-06-2008 11:10 1:35
data have1;
set have;
if procedurestarttime="DON'T KNOW" then do;
starttime = tranwrd(procedurestarttime, '/ /', ' ');
end;
if procedurestarttime="DON'T KNOW" then do;
starttime = tranwrd(procedurestarttime,'AM', ' ');
end;
if procedurestarttime="DON'T KNOW" then do;
starttime = timepart(procedurestarttime);
end;
/*Do same for procedurendtime*/
run;
If you are willing to work with actual time values instead of strings this does most of what you want. The "not reported" being an ambiguous time value end up as missing values.
data example; infile datalines dsd dlm=','; informat str $20.; input str; if length(str)> 10 then do; dt = input(str,anydtdtm32.); time= timepart(dt); end; else time= input(str,time8.); format time time8.; datalines; "14:08" "8:56:00" "7:14:00 AM" "1/23/2018 11:14" "2/8/2018 18:47" "10:00am" "1350" "01-06-2008 09:35" "10:00am" "1350" "01-06-2008 09:35" ;
the key part is in the if /then/else. You didn't provide actual data so I dummied up your values into something to manipulate.
Time values I suspect in the long run will be easier to work with in the long run as you can use the time functions to pull out things like hour or minute or intervals with the INTCK function or increment with INTNX. The format I assigned, TIME8. will show time using a 24 hour clock. If you want to see AM/PM then you could use the TIMEAMPM format. Or even create a custom appearance using the Proc Format Picture statement.
If you are willing to work with actual time values instead of strings this does most of what you want. The "not reported" being an ambiguous time value end up as missing values.
data example; infile datalines dsd dlm=','; informat str $20.; input str; if length(str)> 10 then do; dt = input(str,anydtdtm32.); time= timepart(dt); end; else time= input(str,time8.); format time time8.; datalines; "14:08" "8:56:00" "7:14:00 AM" "1/23/2018 11:14" "2/8/2018 18:47" "10:00am" "1350" "01-06-2008 09:35" "10:00am" "1350" "01-06-2008 09:35" ;
the key part is in the if /then/else. You didn't provide actual data so I dummied up your values into something to manipulate.
Time values I suspect in the long run will be easier to work with in the long run as you can use the time functions to pull out things like hour or minute or intervals with the INTCK function or increment with INTNX. The format I assigned, TIME8. will show time using a 24 hour clock. If you want to see AM/PM then you could use the TIMEAMPM format. Or even create a custom appearance using the Proc Format Picture statement.
This works perfectly! Thank you!
With the exception where one or all 3 fields have missing values, the above is a representative sample of the combinations found in the dataset.
@eap wrote:
With the exception where one or all 3 fields have missing values, the above is a representative sample of the combinations found in the dataset.
Is this a response to something? Can not tell what it relates to.
You probably need to use different INFORMAT based on how the data looks. Here is method that uses / or - to indicate that date is included. Then checks whether : is included to check for whether to use HHMMSS informat or not.
data have;
infile cards dsd truncover dlm='|';
input ProcedureStarttime :$32. ProcedureEndtime :$32. Time_interval :$32. ;
cards;
14:08 | 15:31 | 1:23
8:56:00 | 10:40:00 | 1:44:00
7:14:00 AM | 8:14:00 AM | 1:00:00
1/23/2018 11:14 | 1/23/2018 12:03 | 12:49:00 AM
2/8/2018 18:47 | 2/8/2018 19:50 | 1:03
10:00am | 12:00pm | 120
1350 | 1532 | Not Reported
01-06-2008 09:35 | 01-06-2008 11:10 | 1:35
| |
10:00am | 12:00pm | 120
1350 | 1532 | Not Reported
01-06-2008 09:35 | 01-06-2008 11:10 | 1:35
;
data want ;
set have;
length want $32 ;
if cmiss(ProcedureStarttime,ProcedureEndtime) then want=' ';
else if indexc(cats(ProcedureStarttime,ProcedureEndtime),'/-') then do;
time_diff = input(ProcedureEndtime,anydtdtm32.)-input(ProcedureStarttime,anydtdtm32.);
end;
else if indexc(cats(ProcedureStarttime,ProcedureEndtime),':') then do;
time_diff = input(ProcedureEndtime,anydttme32.)-input(ProcedureStarttime,anydttme32.);
end;
else do;
time_diff = input(ProcedureEndtime,hhmmss.)-input(ProcedureStarttime,hhmmss.);
end;
if not missing(time_diff) then want=put(time_diff,hhmm10.);
drop time_diff;
run;
proc print;
run;
Procedure Time_ Obs Starttime ProcedureEndtime interval want 1 14:08 15:31 1:23 1:23 2 8:56:00 10:40:00 1:44:00 1:44 3 7:14:00 AM 8:14:00 AM 1:00:00 1:00 4 1/23/2018 11:14 1/23/2018 12:03 12:49:00 AM 0:49 5 2/8/2018 18:47 2/8/2018 19:50 1:03 1:03 6 10:00am 12:00pm 120 2:00 7 1350 1532 Not Reported 1:42 8 01-06-2008 09:35 01-06-2008 11:10 1:35 1:35 9 10 10:00am 12:00pm 120 2:00 11 1350 1532 Not Reported 1:42 12 01-06-2008 09:35 01-06-2008 11:10 1:35 1:35
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.