BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ja5ya
Fluorite | Level 6

Hi, 

Can someone help me with this? If I am getting the same data format as the example below. I'd like to pull the next word after matching the long string.

 

Example:

string = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.'

 

Output:

match1= 8/21/20

match2=9/21/20

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force:

data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;

data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;

View solution in original post

9 REPLIES 9
Ja5ya
Fluorite | Level 6
No, since "on" and "until" are very common words. So, need the whole phrase in front of it.
Kurt_Bremser
Super User

Brute force:

data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;

data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
Ja5ya
Fluorite | Level 6
Thanks, this helps!
hhinohar
Quartz | Level 8

something like this?

 

data _null_;
   ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
   text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
   start = 1;
   stop = length(text);
      /* Use PRXNEXT to find the first instance of the pattern, */
      /* then use DO WHILE to find all further instances.       */
      /* PRXNEXT changes the start parameter so that searching  */
      /* begins again after the last match.                     */
   call prxnext(ExpressionID, start, stop, text, position, length);
      do while (position > 0);
         match = substr(text, position, length);
         put match=;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
run;
Ja5ya
Fluorite | Level 6
Almost correct. Would need to assign different variables though. If the first match then one variable, if the send match then other variable.
hhinohar
Quartz | Level 8

 After a few struggle, this is what I came up. 
 There should be much better code but this is the best I can offer. 

 

data want;
   ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
   text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
   start = 1;
   stop = length(text);
   array match[2] $;
   call prxnext(ExpressionID, start, stop, text, position, length);
   	  i=0;
      do while (position > 0);
         i+1;
         match[i] = substr(text, position, length);
         put match[i]=;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
      keep match1 match2;
run;
RichardDeVen
Barite | Level 11

You might be better of saving more of the 'pretext' (context previous to) of the date.

 

Example:

Find up to 5 'dates' in a slash date format and their pretexts.

data have;
infile cards truncover;
input text $char100.;
datalines;
1/1/2020 opening
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
Somebody opened on 20AUG2020, a day before me!
Last call
closing 8/22/2020
;

data want(keep=text date: pretext:);
  set have;

  array pretext(3) $100 ;
  array datestr(3) $10;

  rxid = prxparse ('/(\d{1,2}\/\d{1,2}\/\d{2,4})/');

  start = 1;
  stop = -1;

  put /text=;

  do index = 1 to dim(datestr);
    prepos = start;
    call prxnext(rxid, start, stop, text, position, length);
    put prepos= start= stop= position= length=;

    if position > prepos then 
      pretext(index) = substr(text, prepos, position-prepos);

    if position > 0 then
      datestr(index) = substr(text, position, length);
    else
      pretext(index) = substr(text, start);

    if length = 0 then leave;
  end;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3169 views
  • 1 like
  • 4 in conversation