Hello all, I am trying to do some address matching and am attempting pre-clean up and want to remove (ST|ND|RD|TH) from any address line to have better matching where some add it some do not... of course I have to match in the case where the word starts with a number not a street name like Sand street. have: 1104 E 148TH ST 1150 W 10TH ST 6836 34TH ST 4260 54TH ST 315 W 14TH ST 215 9TH ST 5601 47TH AVE 115 1ST AVE 1100 47TH ST 392 16TH ST 379 12TH ST 5550 N 1ST ST 4411 N 1ST ST 1325 E 4TH ST 1301 W 182ND ST 1230 WEST 177TH STREET 1023 14TH ST 750 W 10TH ST 321 16TH ST 1400 E 8TH ST 103 9TH STREET 125 EAST 6TH 418 W 8TH ST 131 E 28TH AVE 2565 E 58TH ST 631 12TH STREET 312 W 7TH ST 205 S 1ST ST 1900 18TH AVENUE 42145 30TH ST W 44514 20TH ST W 44701 32ND ST W 1350 3RD STREET
want 1104 E 148 ST 1150 W 10 ST 6836 34 ST 4260 54 ST 315 W 14 ST 215 9 ST 5601 47 AVE 115 1 AVE 1100 47 ST 392 16 ST 379 12 ST 5550 N 1 ST 4411 N 1 ST 1325 E 4 ST 1301 W 182 ST 1230 WEST 177 STREET 1023 14 ST 750 W 10 ST 321 16 ST 1400 E 8 ST 103 9 STREET 125 EAST 6 418 W 8 ST 131 E 28 AVE 2565 E 58 ST 631 12 STREET 312 W 7 ST 205 S 1 ST 1900 18 AVENUE 42145 30 ST W 44514 20 ST W 44701 32 ST W 1350 3 STREET
proc sql;
create table my_clean as
SELECT Id
,Name
,Country
,Address1
/* This code does not work */
,compbl(prxchange('s/\s\d{1,}(ST|ND|RD|TH)\s/\s\d{1,}()\s/', -1, upper(Address1))) as clean_Address1
/* This line works just fine to find the bad guys */
,prxmatch('/\s\d{1,}(ST|ND|RD|TH)\s/', upper(Address1)) as find_numsuffix1
,Address2
,compbl(upper(Address2)) as clean_Address2
/* This line works just fine to find the bad guys too */
,prxmatch('/\s\d{1,}(ST|ND|RD|TH)\s/', upper(Address2)) as find_numsuffix2
,City
,State
,Zip
FROM my_source
;QUIT;
example junk I am getting back: 5550 N\s\d{1,}()\sST 4411 N\s\d{1,}()\sST 1325 E\s\d{1,}()\sST 1301 W\s\d{1,}()\sST 1230 WEST\s\d{1,}()\sSTREET 1023\s\d{1,}()\sST 750 W\s\d{1,}()\sST 321\s\d{1,}()\sST 1400 E\s\d{1,}()\sST 103\s\d{1,}()\sSTREET 125 EAST\s\d{1,}()\s 418 W\s\d{1,}()\sST 131 E\s\d{1,}()\sAVE 2565 E\s\d{1,}()\sST 631\s\d{1,}()\sSTREET 312 W\s\d{1,}()\sST 205 S\s\d{1,}()\sST 1900\s\d{1,}()\sAVENUE 42145\s\d{1,}()\sST W 44514\s\d{1,}()\sST W 44701\s\d{1,}()\sST W 1350\s\d{1,}()\sSTREET 44662\s\d{1,}()\sST W 235 E\s\d{1,}()\sST 4400 E\s\d{1,}()\sST 325 E\s\d{1,}()\sST 1575 WEST\s\d{1,}()\sSTREET in this case it is taking my replacement string as literal and It needs to bring back the part of the number and space it matched on leaving the ST | ND | RD | TH behind. I am not sure I have seen an example like this before. if so I have forgotten. I hope there is an easer way, that does not require a SAS server module because we do not have the software or budget. PS data step or other options are okay too. I just mainly write proc sql code... TIA -KJ
... View more