Hello Community!
I was trying to extract string between parenthesis and using regular expression. I was able to extract the first occurrence, but want all the values if multiple parenthesis exists.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
if _n_=1 then do;
retain re;
re = prxparse('/\(([^()]*)\)/'); /* How to define my pattern here?*/
if missing(re) then do;
putlog 'ERROR: regex is malformed';
stop;
end;
end;
if prxmatch(re,char_var) then do;
char_var_new = prxchange('s/\(([^\)]+)\)//i',-1,char_var); /* I want all the values that were removed here */
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);
/* call prxposn( re, 1, endgstart, endglen);*/
/* code = substr(char_var,endgstart,endglen);*/
output;
end;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
run;
Below code should give you the idea.
The RegEx uses a positive look behind and look ahead (highligted).
You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
data want;
_inrowId=_n_;
set have;
_prxid = prxparse('/(?<=\().+?(?=\))/oi');
_start = 1;
_stop = length(char_var);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
if _pos<=0 then output;
else do while (_pos > 0);
found = substr(char_var, _pos, _len);
put found= _pos= _len=;
output;
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
end;
run;
proc print data=want;
run;
Like this?
re = prxparse('/ \( ( [^()]+ ) \) .*? ( \( ( [^()]+ ) \) )? /x');
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 3, char_var);
or
re = prxparse('/ \( ( [^()]+ ) \) .*? (?: \( ( [^()]+ ) \) )? /x');
code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);
Below code should give you the idea.
The RegEx uses a positive look behind and look ahead (highligted).
You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.
data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;
datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;
data want;
_inrowId=_n_;
set have;
_prxid = prxparse('/(?<=\().+?(?=\))/oi');
_start = 1;
_stop = length(char_var);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
if _pos<=0 then output;
else do while (_pos > 0);
found = substr(char_var, _pos, _len);
put found= _pos= _len=;
output;
call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
end;
run;
proc print data=want;
run;
Sir @Patrick Neatest by virtue of solution,comments and approach. Most will/should understand the exquisite algo-rhythm of yours. Priceless. Thank you 1E6
PS I really do like the O modifer forcing to compile only once
The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.
@s_lassen wrote:
The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.
Very true. I just made it a coding habit for myself to always use it unless there is a reason not to.
Thank you! This is very helpful and informative.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.