I want to extract text from a string using mutiples patterns. I am getting error "The PRXPARSE function call does not have enough arguments.".
Something like below should do.
I've modified your RegEx adding word boundary metacharacter \b so your 2nd regex does not match Maple Street
data have;
input street $80.;
datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
711 Maple Street
;
run;
data patterns;
input regex :$100.;
datalines;
m/\d+\s[a-z]+\s[a-z]+/i
m/\b(Pl|place)\b/i
m/\b(rd|road)\b/i
m/\b(ave|avenue)\b/i
;
run;
data _null_;
call symputx('n_patterns',nobs);
stop;
set patterns nobs=nobs;
run;
data want;
set have;
if _n_=1 then
do;
array expr_id {&n_patterns} _temporary_;
do i=1 by 1 until(last);
set patterns end=last;
expr_id[i]=prxparse(strip(regex));
end;
/* create variable match with same length as variable street */
if 0 then match=street;
length matchtype $8;
end;
do i=1 to dim(expr_id);
call prxsubstr(expr_id[i], street, position, length);
if position> 0 then
do;
match=substr(street, position, length);
matchtype=cats('pattern', i);
output;
end;
end;
drop regex i;
run;
proc print data=want;
run;
Don't use macro language if not necessary. It only makes debugging harder.
data have;
input street $80.;
datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
;
run;
data want;
set have;
if _n_=1 then
do;
pattern1="m/\d+\s[a-z]+\s[a-z]+/i";
pattern2="m/Pl|place/i";
pattern3="m/rd|road/i";
pattern4="m/ave|avenue/i";
array patterns{4} pattern1 - pattern4;
array expr_id {4} _temporary_;
do i=1 to dim(patterns);
expr_id[i]=prxparse(patterns[i]);
end;
length matchtype $8;
/* create variable match with same length as variable street */
if 0 then match=street;
end;
do i=1 to dim(patterns);
call prxsubstr(expr_id[i], street, position, length);
if position> 0 then
do;
match=substr(street, position, length);
matchtype=cats('pattern', i);
output;
end;
end;
drop Pattern: i;
run;
proc print data=want;
run;
Or even shorter:
data want;
set have;
if _n_=1 then
do;
array expr_id {4} _temporary_;
expr_id[1]=prxparse("m/\d+\s[a-z]+\s[a-z]+/i");
expr_id[2]=prxparse("m/Pl|place/i");
expr_id[3]=prxparse("m/rd|road/i");
expr_id[4]=prxparse("m/ave|avenue/i");
/* create variable match with same length as variable street */
if 0 then match=street;
length matchtype $8;
end;
do i=1 to dim(expr_id);
call prxsubstr(expr_id[i], street, position, length);
if position> 0 then
do;
match=substr(street, position, length);
matchtype=cats('pattern', i);
output;
end;
end;
drop i;
run;
Hi Patrick
Thanks for your prompt reply. Is there anyway you can separate the pattern and prxsubstr code into two data steps, I want to use the same pattern for multiple data. Thanks a lot
Something like below should do.
I've modified your RegEx adding word boundary metacharacter \b so your 2nd regex does not match Maple Street
data have;
input street $80.;
datalines;
Bldg A 153 First Street
6789 64th Ave
4 Moritz Road
7493 Wilkes Place
711 Maple Street
;
run;
data patterns;
input regex :$100.;
datalines;
m/\d+\s[a-z]+\s[a-z]+/i
m/\b(Pl|place)\b/i
m/\b(rd|road)\b/i
m/\b(ave|avenue)\b/i
;
run;
data _null_;
call symputx('n_patterns',nobs);
stop;
set patterns nobs=nobs;
run;
data want;
set have;
if _n_=1 then
do;
array expr_id {&n_patterns} _temporary_;
do i=1 by 1 until(last);
set patterns end=last;
expr_id[i]=prxparse(strip(regex));
end;
/* create variable match with same length as variable street */
if 0 then match=street;
length matchtype $8;
end;
do i=1 to dim(expr_id);
call prxsubstr(expr_id[i], street, position, length);
if position> 0 then
do;
match=substr(street, position, length);
matchtype=cats('pattern', i);
output;
end;
end;
drop regex i;
run;
proc print data=want;
run;
You can avoid needing to make the macro with the number of patterns. Just make the array large enough for the maximum number of patterns you ever expect to have to handle.
This example uses is set to handle 9,999 patterns. But even 99,999 or more should not cause any trouble. Just make sure to adjust the array size and the length of the MATCHTYPE variable. (Or just keep the loop counter numeric variable instead.)
data want;
set have;
* Create variable match with same length as variable street ;
if 0 then match=street;
* Set length of MATCHTYPE long enough for up to 9999 patterns ;
length matchtype $11;
* Make array large enough for 9999 patterns ;
array expr_id [9999] _temporary_;
if _n_=1 then do pattern=1 to nobs;
* Parse regex patterns into array ;
set patterns nobs=nobs;
expr_id[pattern]=prxparse(strip(regex));
end;
* Output any matches ;
do pattern=1 to nobs;
call prxsubstr(expr_id[pattern], street, position, length);
if position> 0 then do;
match=substr(street, position, length);
matchtype=cats('pattern', pattern);
output;
end;
end;
drop regex pattern;
run;
@Tom Sure, that will work as well but I can't see the hurt in an additional simple data _null_ step that won't iterate through the data.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.