BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kaziumair
Quartz | Level 8

Hi , I want to extract date from the following string

"<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">
"

 

the date is -> 210312

 

How can I extract it?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Brute force attack:

data have;
string = '<article id="post-13669" class="post-13669 stories type-stories status-publish has-post-thumbnail hentry category-homepage-stories tag-amab tag-amabhungane tag-dewald-van-rensburg tag-factory tag-idc tag-ikra tag-insa tag-investigative-journalism tag-kzn tag-peter-maskell-auctioneers tag-pots tag-turkey-pots" link="https://amabhungane.org/stories/210312-funder-vs-funder-in-r500m-pots-fiasco/">';
run;

data want;
set have;
pos = index(string,'link="https:');
substring = scan(substr(string,pos+13),-2,"/");
date = input(substr(substring,1,6),yymmdd6.);
format date yymmdd10.;
run;

Someone will come up with a clever application of PRXMATCH, I'm sure.

kaziumair
Quartz | Level 8
Thank you for your help .
andreas_lds
Jade | Level 19

Most likely by using a regular expression. For the text you have posted the expression

\/(\d{6})\D

seems ok.

 

See docs of prxmatch, prxparse and prxposn for details.

kaziumair
Quartz | Level 8
Hi , Thanks for your suggestion , I will check the docs mentioned
PaalNavestad
Pyrite | Level 9

Hi, one way would be to use regexp to find 6 numbers in a row or maybe /followed by 6 numbers then you have your date. Further I would split the string into year, month day as numbers and use the yymmdd function. You may also try anydtdte. informat.

 

Kurt_Bremser
Super User

Don't use the ANY* informats for strings where the date structure is not very clear. With the given string, you might end up with 2012-03-21 or 2021-03-12, depending on locale of the SAS session.

Since you can only use the ANY* informats reliably when the structure is clear already, you never use them, unless you want unpredictable results.

Ksharp
Super User
data want;
set have;
p=prxmatch('/(?<=\/)\d/',string);
if p then want=substr(string,p,6);
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2608 views
  • 5 likes
  • 5 in conversation