Hello Community!
I was trying to extract string between parenthesis and using regular expression. I was able to extract the first occurrence, but want all the values if multiple parenthesis exists.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
if _n_=1 then do;
retain re;
re = prxparse('/\(([^()]*)\)/'); /* How to define my pattern here?*/
if missing(re) then do;
putlog 'ERROR: regex is malformed';
stop;
end;
end;
if prxmatch(re,char_var) then do;
char_var_new = prxchange('s/\(([^\)]+)\)//i',-1,char_var); /* I want all the values that were removed here */
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);
/* call prxposn( re, 1, endgstart, endglen);*/
/* code = substr(char_var,endgstart,endglen);*/
output;
end;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
run;
Below code should give you the idea.
The RegEx uses a positive look behind and look ahead (highligted).
You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
data want;
_inrowId=_n_;
set have;
_prxid = prxparse('/(?<=\().+?(?=\))/oi');
_start = 1;
_stop = length(char_var);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
if _pos<=0 then output;
else do while (_pos > 0);
found = substr(char_var, _pos, _len);
put found= _pos= _len=;
output;
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
end;
run;
proc print data=want;
run;
Like this?
re = prxparse('/ \( ( [^()]+ ) \) .*? ( \( ( [^()]+ ) \) )? /x');
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 3, char_var);
or
re = prxparse('/ \( ( [^()]+ ) \) .*? (?: \( ( [^()]+ ) \) )? /x');
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);
Below code should give you the idea.
The RegEx uses a positive look behind and look ahead (highligted).
You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
data want;
_inrowId=_n_;
set have;
_prxid = prxparse('/(?<=\().+?(?=\))/oi');
_start = 1;
_stop = length(char_var);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
if _pos<=0 then output;
else do while (_pos > 0);
found = substr(char_var, _pos, _len);
put found= _pos= _len=;
output;
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
end;
run;
proc print data=want;
run;
Sir @Patrick Neatest by virtue of solution,comments and approach. Most will/should understand the exquisite algo-rhythm of yours. Priceless. Thank you 1E6
PS I really do like the O modifer forcing to compile only once
The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.
@s_lassen wrote:
The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.
Very true. I just made it a coding habit for myself to always use it unless there is a reason not to.
Thank you! This is very helpful and informative.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.