BookmarkSubscribeRSS Feed
gzr2mz39
Quartz | Level 8

In addition to having a long list of patterns (over 50) to check using regex, I need to check these patterns against more than 700,000 observations.

Does anyone have any advice for improving efficiency?

Here's the macro I'm using to accomplish this task:

%macro prx(pattern,serial);
b=prxparse("&pattern");
if prxmatch(b,serial_number)>0 then do;
check=1;
serial=&serial;
if (length(serial) = length(serial_number)) then check=2;
end;
%mend;

Thank you.

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

The first things that comes to mind, without knowing more:

- can use use functions like index() or similar, they a lot cheaper to use than RegEx?

- can you use else if  to avoid searching once a pattern is matched?

 

This may possibly be cheaper too:

if prxmatch("&pattern",serial_number)>0 then do;

PGStats
Opal | Level 21

Make sure your pattern uses the "o" suffix, as in "/abc[a-c]+/o", as it signals to the compiler that the pattern is a constant that only needs to be compiled once.

PG
ChrisNZ
Tourmaline | Level 20

@PGStats 

My understanding was that SAS used the o suffix by default in recent (9.4 ?) versions of SAS if the RegEx string was a constant. 

I can't find a source though, so maybe am I mistaken.

 

Update: I did a quick test, this runs the same with and without the o.

data _null_;
 do I=1 to 1e7; 
   R=prxmatch('/\d\w\d/o',cat(I));
 end;
run;
Patrick
Opal | Level 21

As others already wrote: Certainly use ELSE and use functions like find() or index() where possible.

If leading and trailing blanks are not important then use STRIP() as well: prxmatch(<regex>,strip(<variable>))

And last but not least: Tweak your RegEx; especially the one's applied on long strings - ie Greedy vs. Lazy

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2136 views
  • 3 likes
  • 4 in conversation