BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kaziumair
Quartz | Level 8

Hi , I want to extract date from the following string

"<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">
"

 

the date is -> 210312

 

How can I extract it?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

kaziumair
Quartz | Level 8
Thank you for your help .
andreas_lds
Jade | Level 19

Most likely by using a regular expression. For the text you have posted the expression

\/(\d{6})\D

seems ok.

 

See docs of prxmatch, prxparse and prxposn for details.

kaziumair
Quartz | Level 8
Hi , Thanks for your suggestion , I will check the docs mentioned
PaalNavestad
Pyrite | Level 9

Hi, one way would be to use regexp to find 6 numbers in a row or maybe /followed by 6 numbers then you have your date. Further I would split the string into year, month day as numbers and use the yymmdd function. You may also try anydtdte. informat.

 

Kurt_Bremser
Super User

Don't use the ANY* informats for strings where the date structure is not very clear. With the given string, you might end up with 2012-03-21 or 2021-03-12, depending on locale of the SAS session.

Since you can only use the ANY* informats reliably when the structure is clear already, you never use them, unless you want unpredictable results.

Ksharp
Super User
data want;
set have;
p=prxmatch('/(?<=\/)\d/',string);
if p then want=substr(string,p,6);
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1737 views
  • 5 likes
  • 5 in conversation