DATA Step, Macro, Functions and more

regular expressions novice

Reply
Contributor
Posts: 53

regular expressions novice

Hi, I need to parse a comment field that has TWO sets of quoted phrases - the kicker is that I only need the contents of the SECOND quoted phrase.

For example:

Blah blah Blah "Dont need this" yada yada yada "Definately need this" so forth and so on...

Is this possible using Perl regular expressions?

TIA.

Regular Contributor
Posts: 171

regular expressions novice

Here is an easy solution that does not use regular expressions.

data test;

comment = 'Blah blah Blah "Dont need this" yada yada yada "Definately need this" so forth and so on...';

quote=scan(comment, -2, '"');

run;

Frequent Contributor
Posts: 104

Re: regular expressions novice

Here's a code snippet that should work, typo notwithstanding.  Modify to suite your needs.

data _null_;

     if _n_ = 1 then do;

          regid = prxparse( '/^.*"(.*)".*"(.*)"/' );

          retain regid;

          if regid = 0 then do;

               put 'improper reg expression';

               stop;

          end;

     end;

     length str s1 s2 $ 1000;   /* or whatever length  */

     input str;

     s1 = prxposn( regid, 1, str );               /*  quick & short  */

     s2 = prxposn( regid, 2, str );               /*  believed to do two independent scans of the str  */

     /*  alternative form - believed to be more efficient as prxmatch is called only once  */

     if prxmatch( regid, str ) then do;     /*  if patterns matches then...  */

          call prxposn( regid, 1, pos, len );

          s1 = substr( str, pos, len );

          call prxposn( regid, 2, pos, len );

          s2 = substr( str, pos, len );

     end;

cards;

Blah blah Blah "Dont need this" yada yada yada "Definately need this" so forth and so on...

;;;;

The regular expression '/^.*"(.*)".*"(.*)"/' deconstructs as follows:

/ is the beginning of the regular expression.  standard convention.

^ is the beginning of the string

.* matches any character 0 or more times (indeterminate)

" matches this character - beginning of 1st quoted string

(.*) is the 1st capture buffer, marked by ().  the content is between the (), in this case .* says everything.

" finishes the 1st quotation

.* says anything can separate the two quoted strings

"(.*)" is the capture buffer for the second quoted string.

/ closes off this regular expression

Hope this helps.  Enjoy.

Super User
Posts: 10,023

Re: regular expressions novice

Agree with polingjw . Since you want the second part of quote.

data test;

comment = 'Blah blah Blah "Dont need this" yada yada yada "Definately need this" so forth and so on...';

quote=scan(comment, 4 , '"','m');

run;

Ksharp

Message was edited by: xia keshan

Frequent Contributor
Posts: 139

Re: regular expressions novice

Thanks Everybody.. Even i din't know about the modifier in scan function. Smiley Happy

Ask a Question
Discussion stats
  • 4 replies
  • 179 views
  • 0 likes
  • 5 in conversation