BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SuryaKiran
Meteorite | Level 14

Hello Community!

 

I was trying to extract string between parenthesis and using regular expression. I was able to extract the first occurrence,  but want all the values if multiple parenthesis exists.

 

data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;

if _n_=1 then do;
 retain re;

 re = prxparse('/\(([^()]*)\)/'); /* How to define my pattern here?*/

if missing(re) then do;
 putlog 'ERROR: regex is malformed';
 stop;
 end;
 end;

if prxmatch(re,char_var) then do;
char_var_new = prxchange('s/\(([^\)]+)\)//i',-1,char_var); /* I want all the values that were removed here */ code1 = prxposn( re, 1, char_var); code2 = prxposn( re, 2, char_var); /* call prxposn( re, 1, endgstart, endglen);*/ /* code = substr(char_var,endgstart,endglen);*/ output; end; datalines; 1111, 2222 (0000) 1111 (11, 22, 33, 44) 90658 11(00),33(111) ; run;
Thanks,
Suryakiran
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below code should give you the idea.

The RegEx uses a positive look behind and look ahead (highligted). Capture.JPG

 

You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.

data have;
  infile datalines truncover;
  length char_var $100.;
  input char_var $1-100;
  datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;

data want;
  _inrowId=_n_;
  set have;
  _prxid = prxparse('/(?<=\().+?(?=\))/oi');
  _start = 1;
  _stop = length(char_var);

  /* Use PRXNEXT to find the first instance of the pattern, */
  /* then use DO WHILE to find all further instances.       */
  /* PRXNEXT changes the _start parameter so that searching  */
  /* begins again after the last match.                     */
  call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  if _pos<=0 then output;
  else do while (_pos > 0);
    found = substr(char_var, _pos, _len);
    put found= _pos= _len=;
    output;
    call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  end;
run;

proc print data=want;
run;

 

 

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Like this?

re = prxparse('/ \( ( [^()]+ ) \) .*? ( \( ( [^()]+ ) \) )? /x'); 

code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 3, char_var);

or

re = prxparse('/ \( ( [^()]+ ) \) .*? (?: \( ( [^()]+ ) \) )? /x'); 

code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);

 

Patrick
Opal | Level 21

Below code should give you the idea.

The RegEx uses a positive look behind and look ahead (highligted). Capture.JPG

 

You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.

data have;
  infile datalines truncover;
  length char_var $100.;
  input char_var $1-100;
  datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;

data want;
  _inrowId=_n_;
  set have;
  _prxid = prxparse('/(?<=\().+?(?=\))/oi');
  _start = 1;
  _stop = length(char_var);

  /* Use PRXNEXT to find the first instance of the pattern, */
  /* then use DO WHILE to find all further instances.       */
  /* PRXNEXT changes the _start parameter so that searching  */
  /* begins again after the last match.                     */
  call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  if _pos<=0 then output;
  else do while (_pos > 0);
    found = substr(char_var, _pos, _len);
    put found= _pos= _len=;
    output;
    call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  end;
run;

proc print data=want;
run;

 

 

novinosrin
Tourmaline | Level 20

Sir @Patrick  Neatest by virtue of solution,comments and approach. Most will/should understand the exquisite algo-rhythm of yours. Priceless. Thank you 1E6 

 

PS I really do like the O modifer forcing to compile only once

s_lassen
Meteorite | Level 14

The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.

Patrick
Opal | Level 21

@s_lassen wrote:

The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.


Very true. I just made it a coding habit for myself to always use it unless there is a reason not to.

SuryaKiran
Meteorite | Level 14

Thank you! This is very helpful and informative. 

Thanks,
Suryakiran

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1250 views
  • 5 likes
  • 5 in conversation