Hello,
I am having issues parsing strings with leading blanks with call prxnext if the first match is zero-length. This is the string to parse:
foo foo foo foo foo
and here is the code snippet:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:^| )([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
The problem is that:
- only twelve matches are found while there are thirteen present
- the first zero-length match seems to overwrite the first non-zero one
regex101.com finds 13 matches as expected. Is this a bug in SAS? Or in my code? Thank you for your help in advance.
This works for some reason:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:(?<=^)|(?<= ))([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
This works for some reason:
data read;
length var $50;
input var char50.;
datalines;
foo foo foo foo foo
;
run;
data regex;
set read;
regex = prxparse("/(?:(?<=^)|(?<= ))([a-z\-]*)(?= |$)/");
start = 1;
stop = length(var);
call prxnext(regex, start, stop, trim(var), pos, len);
do while (pos GT 0);
var_tall = prxposn(regex, 1, trim(var));
output;
call prxnext(regex, start, stop, trim(var), pos, len);
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.