I want to create 2 new variables. Period and DTS from the following example:
999999999K 20APR23:11:39:00 AM
Desired results:
Period
20/04/2023
DTS
20Apr2023 11:39:00 am
Same as @Kurt_Bremser but using an informat that converts a source string with AM|PM directly to a SAS datetime value.
data have;
infile datalines truncover;
input str $100.;
datalines;
999999999K 20APR23:11:39:00 AM
;
data want;
set have;
length dttm_string $25.;
dttm_string=substr(compbl(str),length(scan(str,1,' '))+2);
format dts dateampm22. period ddmmyys10.;
dts=input(dttm_string,datetime19.);
period=datepart(dts);
run;
proc print data=want;
run;
It's always hard to generalize when we are presented with only one example of such data. There are many questions: is the date and time always preceded by text and a blank space? Is the date and time always in the form 20APR23:11:39:00 AM ? Is there always a space before the AM or PM?
However, for this example:
data have;
string='999999999K 20APR23:11:39:00 AM';
run;
data want;
set have;
second_word=scan(string,2,' ');
third_word=scan(string,3,' ');
date=input(second_word,date7.);
time=input(cats(substr(second_word,9),third_word),time10.);
format date date9. time time8.;
drop second_word third_word;
run;
If that does not work for you on your larger data set, please provide more examples that represents any real-world variability in formatting.
I would parse the data with a regular expression:
data have;
string='999999999K 20APR23:11:39:00 AM';
run;
data want;
prxid=prxparse('/ ([^:]*):(.*)$/');
set have;
if prxmatch(prxid,string);
period=input(prxposn(prxid,1,string),date.);
dts=dhms(period,0,0,input(prxposn(prxid,2,string),time.));
drop prxid;
format period ddmmyy10. dts datetime23.;
run;
The PRX expression looks for a blank, followed by something that is not a colon (the first capture buffer), followed by a colon (not captured), and then the rest of the string ("$" means end of string) in the second capture buffer. Once you have the parts its a piece of cake - I put the time value in the seconds parameter of DHMS(), as a SAS time value is actually the number of seconds since midnight, instead of using 3 function calls to get the hours, minutes and seconds.
Of course, if your data is in a flat file, and not in SAS data, use INPUT (replace the CARDS with the actual physical infile):
data want;
input somestring : $20. period date7. +1 time time11.;
format period ddmmyy10.;
dts=dhms(period,0,0,time);
format dts datetime23.;
cards;
999999999K 20APR23:11:39:00 AM
;run;
Use the SCAN function to dissect the string. Convert the 2nd substring to a datetime value with the DATETIME16. informat, and add 12 hours (12 * 3600 seconds) if the 3rd substring is PM.
Same as @Kurt_Bremser but using an informat that converts a source string with AM|PM directly to a SAS datetime value.
data have;
infile datalines truncover;
input str $100.;
datalines;
999999999K 20APR23:11:39:00 AM
;
data want;
set have;
length dttm_string $25.;
dttm_string=substr(compbl(str),length(scan(str,1,' '))+2);
format dts dateampm22. period ddmmyys10.;
dts=input(dttm_string,datetime19.);
period=datepart(dts);
run;
proc print data=want;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.