Hi there, I need to remove the apartment number (secondary unit) in some dirty data. The apartment number comes in many different flavors such as shown in the code below. I am trying to use PRXCHANGE to remove all matches, along with a negative lookahead such as (?!TH) to not match any street names such as '108TH'. The negative lookbehind can be expanded to (?!ST|ND|RD|TH) for firST, secoND, thiRD, and any i-th. I need help concocting a regex for this. So far my regex is not working, I can get it to take all words with numbers out, but can't get the negative look ahead to work in conjunction. Currently, I have "s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!ST|ND|RD|TH)\bs//" s/ is required for PRXCHANGE. I have not gotten it to work otherwise. \b is for the word boundary ([#A-Z]*) is to match any number of alpha characters plus # ([0-9]+) the word needs to have at least one numeric character ([#A-Z]*) the alpha characters can also appear afterwards (?!ST|ND|RD|TH) is the negative look ahead data dirty;
var='W EXAMPLE ROAD #707'; output;
var='N 108TH STREET'; output;
var='S MAIN #D44'; output;
var='SOUTH OAK ROAD 1C'; output;
var='EAST MAIN STREET APT 4 B'; output;
run;
data clean;
set dirty;
*Remove string matches with numeric optionally mixed with alpha (to remove apartment numbers such as: 3B, 3, B3, #3, 3#, #3B);
*^(?!TH) specifies not to match words ending with 'TH';
*s/ is required at both ends. I dont know why, something about replacement text.;
var2 = PRXCHANGE("s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!TH)\bs//",-1, var);
run; The output should be: 'W EXAMPLE ROAD'
'N 108TH STREET'
'S MAIN'
'SOUTH OAK ROAD'
'EAST MAIN STREET APT' Edit: to add expected output and change 'look behind' to 'look ahead'
... View more