BookmarkSubscribeRSS Feed
Stuart_P
Calcite | Level 5

Hi there,

I need to remove the apartment number (secondary unit) in some dirty data.  The apartment number comes in many different flavors such as shown in the code below.  I am trying to use PRXCHANGE to remove all matches, along with a negative lookahead such as (?!TH) to not match any street names such as '108TH'.  The negative lookbehind can be expanded to (?!ST|ND|RD|TH) for firST, secoND, thiRD, and any i-th. 

 

I need help concocting a regex for this. So far my regex is not working, I can get it to take all words with numbers out, but can't get the negative look ahead to work in conjunction. 
Currently, I have "s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!ST|ND|RD|TH)\bs//"

s/         is required for PRXCHANGE. I have not gotten it to work otherwise.

\b         is for the word boundary

([#A-Z]*)  is to match any number of alpha characters plus #

([0-9]+)   the word needs to have at least one numeric character

([#A-Z]*)  the alpha characters can also appear afterwards

(?!ST|ND|RD|TH)    is the negative look ahead

 

 

data dirty;
var='W EXAMPLE ROAD #707'; output;
var='N 108TH STREET'; output;
var='S MAIN #D44'; output;
var='SOUTH OAK ROAD 1C'; output;
var='EAST MAIN STREET APT 4 B'; output;
run;
data clean; set dirty; *Remove string matches with numeric optionally mixed with alpha (to remove apartment numbers such as: 3B, 3, B3, #3, 3#, #3B); *^(?!TH) specifies not to match words ending with 'TH'; *s/ is required at both ends. I dont know why, something about replacement text.; var2 = PRXCHANGE("s/\b(([#A-Z]*)([0-9]+)([#A-Z]*))(?!TH)\bs//",-1, var); run;

 

The output should be:

 

'W EXAMPLE ROAD'
'N 108TH STREET'
'S MAIN'
'SOUTH OAK ROAD'
'EAST MAIN STREET APT'

 

Edit: to add expected output and change 'look behind' to 'look ahead'

2 REPLIES 2
kiranv_
Rhodochrosite | Level 12

what should be the output from the data you have given

Stuart_P
Calcite | Level 5

Sorry, I should have included this in the original post. It should look like this:

 

'W EXAMPLE ROAD'
'N 108TH STREET'
'S MAIN'
'SOUTH OAK ROAD'
'EAST MAIN STREET APT'

  I already have code to remove second unit designators such as 'APT', 'APARTMENT', 'UNIT', ect.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 895 views
  • 0 likes
  • 2 in conversation