Hi there,
I have a column containing really long test and I am struggling to extract the date portion in the string. The date format is yyyymmdd.
An example of the string:
Schoolrecords_class1B_20190809_2209.doc
The character strings vary and I therefore cannot specify a position of the string.
But the format is always 8 digits, correct?
Ok. Try this
data want;
   string = 'Schoolrecords_class1B_20190809_2209.doc';
   c      = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;Small change
data want;
   string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
   c      = prxchange('s/.*(\d{8}).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;Changed the regular expression, so it doesn't catch 12345678 etc.
data want;
   string = 'TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc';
   c      = prxchange('s/.*([12]\d{3}[01]\d[0123]\d).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;Glad to help 🙂
Can you describe the exact logic of the string / date you want to extract?
With scan function you can pick third "word" from original string.
data want;
set have;
Attrib Datevar format=yymmdd10.;
dateVar = input(Scan(DocName,3,'_'),yymmdd10.);
run;
data have;
input have $80.;
pid=prxparse('/\d{8}/');
call prxsubstr(pid,have,p,l);
if p then want=substr(have,p,l);
drop pid p l;
cards;
Schoolrecords_class1B_20190809_2209.doc
;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
