Hello,
I am having issues parsing strings with leading blanks with call prxnext if the first match is zero-length. This is the string to parse:
foo foo foo foo foo
and here is the code snippet:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:^| )([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
The problem is that:
- only twelve matches are found while there are thirteen present
- the first zero-length match seems to overwrite the first non-zero one
regex101.com finds 13 matches as expected. Is this a bug in SAS? Or in my code? Thank you for your help in advance.
This works for some reason:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:(?<=^)|(?<= ))([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
This works for some reason:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:(?<=^)|(?<= ))([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.