I’m using a sample code from SAS here, want to get a different
result.
data _null_;
ExpressionID = prxparse('/(?:s|,?)([crb]at) ?(?:,)?/');
text = 'The woods have a bat, cat and a rat';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, text, position, length);
do while (position > 0);
fnd = prxposn(ExpressionID,1,text);
found = substr(text, position, length);
put fnd= found= position= length= start= stop=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
run;
/*The following lines are written to the SAS log:*/
/* found=bat position=18 length=3*/
/* found=cat position=23 length=3*/
/* found=rat position=34 length=3*/
What I want to get is this instead:
Found=The woods have a bat
Found= cat and
Found= a rat
Basically use the found-word as a delimiter to strip out the string to parts
(as many as it finds, 3 in this case).
OK, keeping your program as intact as possible ...
Add one statement just before DO WHILE:
prior_position = 1;
Then inside the DO WHILE loop, replace FOUND= with:
found = substr(text, prior_position, position - prior_position + 3);
prior_position = position + 3;
You may get ", cat" instead of "cat", but that's probably a decent result.
Good luck.
WWhat's your delimiter? If it was bat/cat/hat your result would be :
the woods have a bat
cat
and a rat
or possibly
the woods have a bat
cat and a
rat
You are right, should be:
the woods have a bat
cat
and a rat
Forget the sample code from SAS, what are you trying to accomplish overall. Can you provide more than one example? Would sat also be a delimiter?
Let's add to the list. Which of these should be considered delimiters?
at
cats
matter
Secretariat
wheat
I have to use prxparse('/(?:\s|,?)([crb]at) ?(?:,)?/'); which means only cat|rat|bat are delimiters(words), just notice a typo in the original post, should be \s which means space, the real program is much more complicated than this one, goal is to get substrings from the big string, in a way of breaking the string by looking at any of these 3 words (in this sample, real case is much more).
And I know in this case a word like brat will show up, that is fine. Thanks.
OK, keeping your program as intact as possible ...
Add one statement just before DO WHILE:
prior_position = 1;
Then inside the DO WHILE loop, replace FOUND= with:
found = substr(text, prior_position, position - prior_position + 3);
prior_position = position + 3;
You may get ", cat" instead of "cat", but that's probably a decent result.
Good luck.
Thank you Astounding! I think that works.
One more thing, is there a way to have the last found return 'and a rat abc' instead of 'and a rat'?
data _null_;
ExpressionID = prxparse('/(?:\s|,?)([crb]at) ?(?:,)?/');
text = 'The woods have a bat, cat and a rat abc';
start = 1;
stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, text, position, length);
prior_position = 1;
do while (position > 0);
fnd = prxposn(ExpressionID,1,text);
found = substr(text, prior_position, position - prior_position + length(fnd)+1);
prior_position = position + length(fnd)+1;
put fnd= found= prior_position position= length= start= stop=;
call prxnext(ExpressionID, start, stop, text, position, length);
end;
run;
No, it wouldn't be easy. What would be easy would be to print one more line at the end:
Remaining text = abc
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.