Hi there,
I have a column containing really long test and I am struggling to extract the date portion in the string. The date format is yyyymmdd.
An example of the string:
Schoolrecords_class1B_20190809_2209.doc
The character strings vary and I therefore cannot specify a position of the string.
But the format is always 8 digits, correct?
Ok. Try this
data want;
string = 'Schoolrecords_class1B_20190809_2209.doc';
c = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
Small change
data want;
string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
c = prxchange('s/.*(\d{8}).*/$1/', -1, string);
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
Changed the regular expression, so it doesn't catch 12345678 etc.
data want;
string = 'TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc';
c = prxchange('s/.*([12]\d{3}[01]\d[0123]\d).*/$1/', -1, string);
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
Glad to help 🙂
Can you describe the exact logic of the string / date you want to extract?
With scan function you can pick third "word" from original string.
data want;
set have;
Attrib Datevar format=yymmdd10.;
dateVar = input(Scan(DocName,3,'_'),yymmdd10.);
run;
data have;
input have $80.;
pid=prxparse('/\d{8}/');
call prxsubstr(pid,have,p,l);
if p then want=substr(have,p,l);
drop pid p l;
cards;
Schoolrecords_class1B_20190809_2209.doc
;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.