I have a table in SAS with ~15,000 peoples' business name and PO Box with their ID number. I need to match on the PO Box in an Excel sheet, but in the SAS table PO Box is written all sorts of ways i.e., P O Box, P.O. Box POBox, P. Box, etc. I found a SUGI paper where someone used Perl reg expressions to reformat all to the same P O Box format. I used this code setting my SAS table where in the paper they used a datalines with just a few lines of code. What I wrote works (tried it for 6 ID numbers); however, it is SO slow. It took 5 minutes to match 6 and I need to do this for 15,000...is there a better faster way to do this? Please help! What I have is
Data pobox_fix; Set POBox; Pochg=prxchange("s/P?\s*\.*\s*O?\s*\.*\s*BOX\s*\.*\s*/P O BOX /i",-1,address); Run;
I'm new to these Perl reg expressions but I have read several papers at this point.