BookmarkSubscribeRSS Feed
Citrine10
Obsidian | Level 7

Hi there,

I have a column containing really long test and I am struggling to extract the date portion in the string. The date format is yyyymmdd.

An example of the string:

Schoolrecords_class1B_20190809_2209.doc

The character strings vary and I therefore cannot specify a position of the string. 

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

But the format is always 8 digits, correct?

Citrine10
Obsidian | Level 7
correct
PeterClemmensen
Tourmaline | Level 20

Ok. Try this

 

data want;
   string = 'Schoolrecords_class1B_20190809_2209.doc';
   c      = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
This works for some records but not for a record like this:


data want;
string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
c = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
PeterClemmensen
Tourmaline | Level 20

Small change

 

data want;
   string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
   c      = prxchange('s/.*(\d{8}).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
I think we are almost there. Its just a few odd ones. I thought it was the space and used compress however that didn't work:


data want;
string = "TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc";
c = compress(prxchange('s/.*(\d{8}).*/$1/', -1, string));
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
PeterClemmensen
Tourmaline | Level 20

Changed the regular expression, so it doesn't catch 12345678 etc.

 

data want;
   string = 'TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc';
   c      = prxchange('s/.*([12]\d{3}[01]\d[0123]\d).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
Thank you so much, This worked. Appreciate your help.

One last thing, how do I do it in the case where I only want the date after the DDD
GR1_ClassGr_ DDD1DDD1245620210104_23?
e.g. I would need the 2021-01-04 from the above
This is another dataset where the date needs to be extracted.
PeterClemmensen
Tourmaline | Level 20

Glad to help 🙂

 

Can you describe the exact logic of the string / date you want to extract?

Citrine10
Obsidian | Level 7
for example if the string is: GR1_ClassGr_ DDD1DDD1245620210104_23?
e.g. I would need the date: 2021-01-04
Aku
Obsidian | Level 7 Aku
Obsidian | Level 7

With scan function you can pick third "word" from original string. 

 


data want;
set have;
Attrib Datevar format=yymmdd10.;
dateVar = input(Scan(DocName,3,'_'),yymmdd10.);
run;

Ksharp
Super User
data have;
input have $80.;
pid=prxparse('/\d{8}/');
call prxsubstr(pid,have,p,l);
if p then want=substr(have,p,l);
drop pid p l;
cards;
Schoolrecords_class1B_20190809_2209.doc
;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4437 views
  • 0 likes
  • 4 in conversation