BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
allaboutsas
Calcite | Level 5

I’m using a sample code from SAS here, want to get a different

result.

data _null_;

   ExpressionID = prxparse('/(?:s|,?)([crb]at) ?(?:,)?/');

   text = 'The woods have a bat, cat and a rat';

   start = 1;

   stop = length(text);

      /* Use PRXNEXT to find the first instance of the pattern, */

      /* then use DO WHILE to find all further instances.       */

      /* PRXNEXT changes the start parameter so that searching  */

      /* begins again after the last match.                     */

   call prxnext(ExpressionID, start, stop, text, position, length);

      do while (position > 0);

          fnd = prxposn(ExpressionID,1,text);

         found = substr(text, position, length);

         put fnd= found= position= length= start= stop=;

         call prxnext(ExpressionID, start, stop, text, position, length);

      end;

run;

/*The following lines are written to the SAS log:*/

/*   found=bat position=18 length=3*/

/*   found=cat position=23 length=3*/

/*   found=rat position=34 length=3*/

What I want to get is this instead:

Found=The woods have a bat

Found= cat and

Found= a rat

Basically use the found-word as a delimiter to strip out the string to parts

(as many as it finds, 3 in this case).

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, keeping your program as intact as possible ...

Add one statement just before DO WHILE:

prior_position = 1;

Then inside the DO WHILE loop, replace FOUND= with:

found = substr(text, prior_position, position - prior_position + 3);

prior_position = position + 3;

You may get ", cat" instead of "cat", but that's probably a decent result.

Good luck.

View solution in original post

10 REPLIES 10
Reeza
Super User

WWhat's your delimiter? If it was bat/cat/hat your result would be :

the  woods have a bat

cat

and a rat


or possibly


the  woods have a bat

cat and a

rat

allaboutsas
Calcite | Level 5

You are right, should be:

the  woods have a bat

cat

and a rat

Reeza
Super User

Forget the sample code from SAS, what are you trying to accomplish overall. Can you provide more than one example? Would sat also be a delimiter?

Astounding
PROC Star

Let's add to the list.  Which of these should be considered delimiters?

at

cats

matter

Secretariat

wheat

allaboutsas
Calcite | Level 5

I have to use prxparse('/(?:\s|,?)([crb]at) ?(?:,)?/'); which means only cat|rat|bat are delimiters(words), just notice a typo in the original post, should be \s which means space, the real program is much more complicated than this one, goal is to get substrings from the big string, in a way of breaking the string by looking at any of these 3 words (in this sample, real case is much more).

allaboutsas
Calcite | Level 5

And I know in this case a word like brat will show up, that is fine. Thanks.

Astounding
PROC Star

OK, keeping your program as intact as possible ...

Add one statement just before DO WHILE:

prior_position = 1;

Then inside the DO WHILE loop, replace FOUND= with:

found = substr(text, prior_position, position - prior_position + 3);

prior_position = position + 3;

You may get ", cat" instead of "cat", but that's probably a decent result.

Good luck.

allaboutsas
Calcite | Level 5

Thank you Astounding! I think that works.

allaboutsas
Calcite | Level 5

One more thing, is there a way to have the last found return 'and a rat abc' instead of 'and a rat'?

data _null_;

   ExpressionID = prxparse('/(?:\s|,?)([crb]at) ?(?:,)?/');

   text = 'The woods  have a bat, cat and a rat abc';

   start = 1;

   stop = length(text);

       /* Use PRXNEXT to find the first instance of the pattern, */

      /* then use DO WHILE to find all further instances.       */

      /* PRXNEXT changes the start parameter so that searching  */

      /* begins again after the last match.                     */

   call prxnext(ExpressionID, start, stop, text, position, length);

     prior_position = 1;

      do while (position > 0);

    fnd = prxposn(ExpressionID,1,text);

         found = substr(text, prior_position, position - prior_position + length(fnd)+1);

  prior_position = position + length(fnd)+1;

         put fnd= found= prior_position position= length= start= stop=;

         call prxnext(ExpressionID, start, stop, text, position, length);

      end;

run;

Astounding
PROC Star

No, it wouldn't be easy.  What would be easy would be to print one more line at the end:

Remaining text = abc

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1799 views
  • 0 likes
  • 3 in conversation