BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kaziumair
Quartz | Level 8

Hi , I want to extract date from the following string

"<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">
"

 

the date is -> 210312

 

How can I extract it?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

kaziumair
Quartz | Level 8
Thank you for your help .
andreas_lds
Jade | Level 19

Most likely by using a regular expression. For the text you have posted the expression

\/(\d{6})\D

seems ok.

 

See docs of prxmatch, prxparse and prxposn for details.

kaziumair
Quartz | Level 8
Hi , Thanks for your suggestion , I will check the docs mentioned
PaalNavestad
Pyrite | Level 9

Hi, one way would be to use regexp to find 6 numbers in a row or maybe /followed by 6 numbers then you have your date. Further I would split the string into year, month day as numbers and use the yymmdd function. You may also try anydtdte. informat.

 

Kurt_Bremser
Super User

Don't use the ANY* informats for strings where the date structure is not very clear. With the given string, you might end up with 2012-03-21 or 2021-03-12, depending on locale of the SAS session.

Since you can only use the ANY* informats reliably when the structure is clear already, you never use them, unless you want unpredictable results.

Ksharp
Super User
data want;
set have;
p=prxmatch('/(?<=\/)\d/',string);
if p then want=substr(string,p,6);
run;

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2658 views
  • 5 likes
  • 5 in conversation