BookmarkSubscribeRSS Feed
Citrine10
Obsidian | Level 7

Hi there,

I have a column containing really long test and I am struggling to extract the date portion in the string. The date format is yyyymmdd.

An example of the string:

Schoolrecords_class1B_20190809_2209.doc

The character strings vary and I therefore cannot specify a position of the string. 

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

But the format is always 8 digits, correct?

Citrine10
Obsidian | Level 7
correct
PeterClemmensen
Tourmaline | Level 20

Ok. Try this

 

data want;
   string = 'Schoolrecords_class1B_20190809_2209.doc';
   c      = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
This works for some records but not for a record like this:


data want;
string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
c = prxchange('s/.*_(\d{8})_.*/$1/', -1, string);
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
PeterClemmensen
Tourmaline | Level 20

Small change

 

data want;
   string = 'TST_NoRecs_BBBGRADE12BEFE20210701_220019.doc';
   c      = prxchange('s/.*(\d{8}).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
I think we are almost there. Its just a few odd ones. I thought it was the space and used compress however that didn't work:


data want;
string = "TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc";
c = compress(prxchange('s/.*(\d{8}).*/$1/', -1, string));
date = input(c, yymmdd8.);
format date yymmdd10.;
run;
PeterClemmensen
Tourmaline | Level 20

Changed the regular expression, so it doesn't catch 12345678 etc.

 

data want;
   string = 'TST_NoRecs_ BBBGRADE22010508_24-T123456789.doc';
   c      = prxchange('s/.*([12]\d{3}[01]\d[0123]\d).*/$1/', -1, string);
   date   = input(c, yymmdd8.);
   format date yymmdd10.;
run;
Citrine10
Obsidian | Level 7
Thank you so much, This worked. Appreciate your help.

One last thing, how do I do it in the case where I only want the date after the DDD
GR1_ClassGr_ DDD1DDD1245620210104_23?
e.g. I would need the 2021-01-04 from the above
This is another dataset where the date needs to be extracted.
PeterClemmensen
Tourmaline | Level 20

Glad to help 🙂

 

Can you describe the exact logic of the string / date you want to extract?

Citrine10
Obsidian | Level 7
for example if the string is: GR1_ClassGr_ DDD1DDD1245620210104_23?
e.g. I would need the date: 2021-01-04
Aku
Obsidian | Level 7 Aku
Obsidian | Level 7

With scan function you can pick third "word" from original string. 

 


data want;
set have;
Attrib Datevar format=yymmdd10.;
dateVar = input(Scan(DocName,3,'_'),yymmdd10.);
run;

Ksharp
Super User
data have;
input have $80.;
pid=prxparse('/\d{8}/');
call prxsubstr(pid,have,p,l);
if p then want=substr(have,p,l);
drop pid p l;
cards;
Schoolrecords_class1B_20190809_2209.doc
;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1665 views
  • 0 likes
  • 4 in conversation