SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SK_11
Obsidian | Level 7

I want to extract text from a string using mutiples patterns. I am getting error "The PRXPARSE function call does not have enough arguments.".

 
data have;
  input street $80.;
datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
;
run;
 
 
 
%macro test;
%global p1 p2 p3 p4  ;
 
data want;
set have;
   pattern1 = "m/\d+\s[a-z]+\s[a-z]+/oi";
   pattern2 = "m/Pl|place/i";
pattern3 = "m/rd|road/i";
  pattern4 = "m/ave|avenue/i";
  
 
%do i=1 %to 4;
      call symputx(cats('p',&i), cats('pattern',&i.) , 'g');
%end;
 %do j=1 %to 4;
 
      ExpressionID = prxparse(&&p&j);
   call prxsubstr(ExpressionID, street, position, length);
   %if length> 0 %then 
   %do;
      match = substr(street, position, length);
  matchtype=cats('pattern',&j);
  output;
   %end;
 
   %end;
drop Pattern:;
run;
 
%mend;
%test;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Something like below should do. 
I've modified your RegEx adding word boundary metacharacter \b so your 2nd regex does not match Maple Street

data have;
  input street $80.;
  datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
711 Maple Street
;
run;

data patterns;
  input regex :$100.;
  datalines;
m/\d+\s[a-z]+\s[a-z]+/i
m/\b(Pl|place)\b/i
m/\b(rd|road)\b/i
m/\b(ave|avenue)\b/i
;
run;

data _null_;
  call symputx('n_patterns',nobs);
  stop;
  set patterns nobs=nobs;
run;

data want;
  set have;
  if _n_=1 then
    do;
      array expr_id {&n_patterns} _temporary_;
      do i=1 by 1 until(last);
        set patterns end=last;
        expr_id[i]=prxparse(strip(regex));
      end;
      /* create variable match with same length as variable street */
      if 0 then match=street;
      length matchtype $8;
    end;

  do i=1 to dim(expr_id);
    call prxsubstr(expr_id[i], street, position, length);
    if position> 0 then 
      do;
        match=substr(street, position, length);
        matchtype=cats('pattern', i);
        output;
      end;
  end;
  drop regex i;
run;

proc print data=want;
run;

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

Don't use macro language if not necessary. It only makes debugging harder. 

data have;
  input street $80.;
  datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
;
run;

data want;
  set have;
  if _n_=1 then
    do;
      pattern1="m/\d+\s[a-z]+\s[a-z]+/i";
      pattern2="m/Pl|place/i";
      pattern3="m/rd|road/i";
      pattern4="m/ave|avenue/i";
      array patterns{4} pattern1 - pattern4;
      array expr_id {4} _temporary_;
      do i=1 to dim(patterns);
        expr_id[i]=prxparse(patterns[i]);
      end;
      length matchtype $8;
      /* create variable match with same length as variable street */
      if 0 then match=street;
    end;

  do i=1 to dim(patterns);
    call prxsubstr(expr_id[i], street, position, length);
    if position> 0 then 
      do;
        match=substr(street, position, length);
        matchtype=cats('pattern', i);
        output;
      end;
  end;
  drop Pattern: i;
run;

proc print data=want;
run;

 Or even shorter:

data want;
  set have;
  if _n_=1 then
    do;
      array expr_id {4} _temporary_;
      expr_id[1]=prxparse("m/\d+\s[a-z]+\s[a-z]+/i");
      expr_id[2]=prxparse("m/Pl|place/i");
      expr_id[3]=prxparse("m/rd|road/i");
      expr_id[4]=prxparse("m/ave|avenue/i");
      /* create variable match with same length as variable street */
      if 0 then match=street;
      length matchtype $8;
    end;

  do i=1 to dim(expr_id);
    call prxsubstr(expr_id[i], street, position, length);
    if position> 0 then 
      do;
        match=substr(street, position, length);
        matchtype=cats('pattern', i);
        output;
      end;
  end;
  drop i;
run;
SK_11
Obsidian | Level 7

Hi Patrick

Thanks for your prompt reply. Is there anyway you can separate the pattern and prxsubstr code into two data steps, I want to use the same pattern for multiple data. Thanks a lot

 

Patrick
Opal | Level 21

Something like below should do. 
I've modified your RegEx adding word boundary metacharacter \b so your 2nd regex does not match Maple Street

data have;
  input street $80.;
  datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
711 Maple Street
;
run;

data patterns;
  input regex :$100.;
  datalines;
m/\d+\s[a-z]+\s[a-z]+/i
m/\b(Pl|place)\b/i
m/\b(rd|road)\b/i
m/\b(ave|avenue)\b/i
;
run;

data _null_;
  call symputx('n_patterns',nobs);
  stop;
  set patterns nobs=nobs;
run;

data want;
  set have;
  if _n_=1 then
    do;
      array expr_id {&n_patterns} _temporary_;
      do i=1 by 1 until(last);
        set patterns end=last;
        expr_id[i]=prxparse(strip(regex));
      end;
      /* create variable match with same length as variable street */
      if 0 then match=street;
      length matchtype $8;
    end;

  do i=1 to dim(expr_id);
    call prxsubstr(expr_id[i], street, position, length);
    if position> 0 then 
      do;
        match=substr(street, position, length);
        matchtype=cats('pattern', i);
        output;
      end;
  end;
  drop regex i;
run;

proc print data=want;
run;
SK_11
Obsidian | Level 7
Thanks a lot
Tom
Super User Tom
Super User

You can avoid needing to make the macro with the number of patterns.  Just make the array large enough for the maximum number of patterns you ever expect to have to handle. 

 

This example uses is set to handle 9,999 patterns.  But even 99,999 or more should not cause any trouble. Just make sure to adjust the array size and the length of the MATCHTYPE variable.  (Or just keep the loop counter numeric variable instead.)

data want;
  set have;
* Create variable match with same length as variable street ;
  if 0 then match=street;
* Set length of MATCHTYPE long enough for up to 9999 patterns ;
  length matchtype $11;
* Make array large enough for 9999 patterns ;
  array expr_id [9999] _temporary_;
  if _n_=1 then do pattern=1 to nobs;
* Parse regex patterns into array ;
    set patterns nobs=nobs;
    expr_id[pattern]=prxparse(strip(regex));
  end;
* Output any matches  ;
  do pattern=1 to nobs;
    call prxsubstr(expr_id[pattern], street, position, length);
    if position> 0 then do;
      match=substr(street, position, length);
      matchtype=cats('pattern', pattern);
      output;
    end;
  end;
  drop regex pattern;
run; 

 

Patrick
Opal | Level 21

@Tom Sure, that will work as well but I can't see the hurt in an additional simple data _null_ step that won't iterate through the data. 

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 674 views
  • 3 likes
  • 3 in conversation