BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ja5ya
Fluorite | Level 6

Hi, 

Can someone help me with this? If I am getting the same data format as the example below. I'd like to pull the next word after matching the long string.

 

Example:

string = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.'

 

Output:

match1= 8/21/20

match2=9/21/20

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force:

data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;

data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;

View solution in original post

9 REPLIES 9
Ja5ya
Fluorite | Level 6
No, since "on" and "until" are very common words. So, need the whole phrase in front of it.
Kurt_Bremser
Super User

Brute force:

data have;
input string $80.;
datalines4;
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
;;;;

data want;
set have;
i = find(string,"The shop will close on");
if i then match1 = input(scan(substr(string,i),6," ."),mmddyy10.);
i = find(string,"They will have until");
if i then match2 = input(scan(substr(string,i),5," ."),mmddyy10.);
format match1 match2 mmddyy10.;
drop i;
run;
Ja5ya
Fluorite | Level 6
Thanks, this helps!
hhinohar
Quartz | Level 8

something like this?

 

data _null_;
   ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
   text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
   start = 1;
   stop = length(text);
      /* Use PRXNEXT to find the first instance of the pattern, */
      /* then use DO WHILE to find all further instances.       */
      /* PRXNEXT changes the start parameter so that searching  */
      /* begins again after the last match.                     */
   call prxnext(ExpressionID, start, stop, text, position, length);
      do while (position > 0);
         match = substr(text, position, length);
         put match=;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
run;
Ja5ya
Fluorite | Level 6
Almost correct. Would need to assign different variables though. If the first match then one variable, if the send match then other variable.
hhinohar
Quartz | Level 8

 After a few struggle, this is what I came up. 
 There should be much better code but this is the best I can offer. 

 

data want;
   ExpressionID = prxparse('/(\d+\/\d+\/\d+)/');
   text = 'The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.';
   start = 1;
   stop = length(text);
   array match[2] $;
   call prxnext(ExpressionID, start, stop, text, position, length);
   	  i=0;
      do while (position > 0);
         i+1;
         match[i] = substr(text, position, length);
         put match[i]=;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
      keep match1 match2;
run;
RichardDeVen
Barite | Level 11

You might be better of saving more of the 'pretext' (context previous to) of the date.

 

Example:

Find up to 5 'dates' in a slash date format and their pretexts.

data have;
infile cards truncover;
input text $char100.;
datalines;
1/1/2020 opening
The shop will close on 8/21/20. They will have until 9/21/20 to reopen it again.
Somebody opened on 20AUG2020, a day before me!
Last call
closing 8/22/2020
;

data want(keep=text date: pretext:);
  set have;

  array pretext(3) $100 ;
  array datestr(3) $10;

  rxid = prxparse ('/(\d{1,2}\/\d{1,2}\/\d{2,4})/');

  start = 1;
  stop = -1;

  put /text=;

  do index = 1 to dim(datestr);
    prepos = start;
    call prxnext(rxid, start, stop, text, position, length);
    put prepos= start= stop= position= length=;

    if position > prepos then 
      pretext(index) = substr(text, prepos, position-prepos);

    if position > 0 then
      datestr(index) = substr(text, position, length);
    else
      pretext(index) = substr(text, start);

    if length = 0 then leave;
  end;
run;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3230 views
  • 1 like
  • 4 in conversation