BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SuryaKiran
Meteorite | Level 14

Hello Community!

 

I was trying to extract string between parenthesis and using regular expression. I was able to extract the first occurrence,  but want all the values if multiple parenthesis exists.

 

data have;
infile datalines truncover;
length char_var $100.;
input char_var $1-100;

if _n_=1 then do;
 retain re;

 re = prxparse('/\(([^()]*)\)/'); /* How to define my pattern here?*/

if missing(re) then do;
 putlog 'ERROR: regex is malformed';
 stop;
 end;
 end;

if prxmatch(re,char_var) then do;
char_var_new = prxchange('s/\(([^\)]+)\)//i',-1,char_var); /* I want all the values that were removed here */ code1 = prxposn( re, 1, char_var); code2 = prxposn( re, 2, char_var); /* call prxposn( re, 1, endgstart, endglen);*/ /* code = substr(char_var,endgstart,endglen);*/ output; end; datalines; 1111, 2222 (0000) 1111 (11, 22, 33, 44) 90658 11(00),33(111) ; run;
Thanks,
Suryakiran
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below code should give you the idea.

The RegEx uses a positive look behind and look ahead (highligted). Capture.JPG

 

You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.

data have;
  infile datalines truncover;
  length char_var $100.;
  input char_var $1-100;
  datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;

data want;
  _inrowId=_n_;
  set have;
  _prxid = prxparse('/(?<=\().+?(?=\))/oi');
  _start = 1;
  _stop = length(char_var);

  /* Use PRXNEXT to find the first instance of the pattern, */
  /* then use DO WHILE to find all further instances.       */
  /* PRXNEXT changes the _start parameter so that searching  */
  /* begins again after the last match.                     */
  call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  if _pos<=0 then output;
  else do while (_pos > 0);
    found = substr(char_var, _pos, _len);
    put found= _pos= _len=;
    output;
    call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  end;
run;

proc print data=want;
run;

 

 

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Like this?

re = prxparse('/ \( ( [^()]+ ) \) .*? ( \( ( [^()]+ ) \) )? /x'); 

code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 3, char_var);

or

re = prxparse('/ \( ( [^()]+ ) \) .*? (?: \( ( [^()]+ ) \) )? /x'); 

code1 = prxposn( re, 1, char_var);
code2 = prxposn( re, 2, char_var);

 

Patrick
Opal | Level 21

Below code should give you the idea.

The RegEx uses a positive look behind and look ahead (highligted). Capture.JPG

 

You don't need an _n_=1 clause for the prxparse() bit. SAS will compile the RegEx only once also if using syntax as below.

data have;
  infile datalines truncover;
  length char_var $100.;
  input char_var $1-100;
  datalines;
1111, 2222 (0000)
1111 (11, 22, 33, 44)
90658
11(00),33(111)
;

data want;
  _inrowId=_n_;
  set have;
  _prxid = prxparse('/(?<=\().+?(?=\))/oi');
  _start = 1;
  _stop = length(char_var);

  /* Use PRXNEXT to find the first instance of the pattern, */
  /* then use DO WHILE to find all further instances.       */
  /* PRXNEXT changes the _start parameter so that searching  */
  /* begins again after the last match.                     */
  call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  if _pos<=0 then output;
  else do while (_pos > 0);
    found = substr(char_var, _pos, _len);
    put found= _pos= _len=;
    output;
    call prxnext(_prxid, _start, _stop, char_var, _pos, _len);
  end;
run;

proc print data=want;
run;

 

 

novinosrin
Tourmaline | Level 20

Sir @Patrick  Neatest by virtue of solution,comments and approach. Most will/should understand the exquisite algo-rhythm of yours. Priceless. Thank you 1E6 

 

PS I really do like the O modifer forcing to compile only once

s_lassen
Meteorite | Level 14

The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.

Patrick
Opal | Level 21

@s_lassen wrote:

The O modifier is unnecessary in this case anyway. It only matters when the RX string contains variables, if the string is a constant, the expression is compiled only once.


Very true. I just made it a coding habit for myself to always use it unless there is a reason not to.

SuryaKiran
Meteorite | Level 14

Thank you! This is very helpful and informative. 

Thanks,
Suryakiran

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1261 views
  • 5 likes
  • 5 in conversation