BookmarkSubscribeRSS Feed
Stuart_P
Calcite | Level 5

Hi there,

I need to remove the apartment number (secondary unit) in some dirty data.  The apartment number comes in many different flavors such as shown in the code below.  I am trying to use PRXCHANGE to remove all matches, along with a negative lookahead such as (?!TH) to not match any street names such as '108TH'.  The negative lookbehind can be expanded to (?!ST|ND|RD|TH) for firST, secoND, thiRD, and any i-th. 

 

I need help concocting a regex for this. So far my regex is not working, I can get it to take all words with numbers out, but can't get the negative look ahead to work in conjunction. 
Currently, I have "s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!ST|ND|RD|TH)\bs//"

s/         is required for PRXCHANGE. I have not gotten it to work otherwise.

\b         is for the word boundary

([#A-Z]*)  is to match any number of alpha characters plus #

([0-9]+)   the word needs to have at least one numeric character

([#A-Z]*)  the alpha characters can also appear afterwards

(?!ST|ND|RD|TH)    is the negative look ahead

 

 

data dirty;
var='W EXAMPLE ROAD #707'; output;
var='N 108TH STREET'; output;
var='S MAIN #D44'; output;
var='SOUTH OAK ROAD 1C'; output;
var='EAST MAIN STREET APT 4 B'; output;
run;
data clean; set dirty; *Remove string matches with numeric optionally mixed with alpha (to remove apartment numbers such as: 3B, 3, B3, #3, 3#, #3B); *^(?!TH) specifies not to match words ending with 'TH'; *s/ is required at both ends. I dont know why, something about replacement text.; var2 = PRXCHANGE("s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!TH)\bs//",-1, var); run;

 

The output should be:

 

'W EXAMPLE ROAD'
'N 108TH STREET'
'S MAIN'
'SOUTH OAK ROAD'
'EAST MAIN STREET APT'

 

Edit: to add expected output and change 'look behind' to 'look ahead'

2 REPLIES 2
kiranv_
Rhodochrosite | Level 12

what should be the output from the data you have given

Stuart_P
Calcite | Level 5

Sorry, I should have included this in the original post. It should look like this:

 

'W EXAMPLE ROAD'
'N 108TH STREET'
'S MAIN'
'SOUTH OAK ROAD'
'EAST MAIN STREET APT'

  I already have code to remove second unit designators such as 'APT', 'APARTMENT', 'UNIT', ect.

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 2632 views
  • 0 likes
  • 2 in conversation