DATA Step, Macro, Functions and more

cleaning character columns

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

cleaning character columns

Hi everyone,

 

I have a long list of addresses and I need to extract just the street name:

 

Dummy data set:

 

Addresses (column name)

1000 Ngapenga rd

25 Gill Lane

po box 234

174/H Mangatin drive

102b te hono st

Te pahu rd

162 No 2 rd

 

I want the extract street name to look like

Street_name:

Ngapenga

Gill

Mangatin

Te Hono

Tepahu

No 2

 

My code is currently below: 

 

set customer_addy;
x = anydigit(addresses,1);
if x = 1 then street_name = substr(addresses,2,length(scan(addresses,2, ' ')));
run;

 

I cannot get my head around how to taken into account all the many conditions. Any help is appreciated.

 

Thanks


Accepted Solutions
Solution
Tuesday
Trusted Advisor
Posts: 1,389

Re: cleaning character columns

[ Edited ]

The function to replace a word is: TRANWRD (not transword).

 

To multiple replacements, you can do:

 

address = addresses;
address = tranwrd(upcase(address), ' ST', ' ');
address = tranwrd(upcase(address), ' DR', ' ');
address = compbl(address);
 

I have added a space before the 'ST', 'DR' - to eliminate replacement in case those are substrings 

(think of EASTERN, ANDRE)

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,389

Re: cleaning character columns

You may try use translate in order to replace numers into space, and

use tranword to replace constants - like ' rd ', ' st ', ' road ', ' street ', ' lane ', etc.  - into spaces,

being aware of lowcase/uppercase, than use compbl the result and check

is ther more to do.

Super Contributor
Posts: 259

Re: cleaning character columns

Write down every rule you want to apply to the variable Addresses, then start coding.

 

Maybe deleting the unwanted content is easier than extracting the required information, the last line of your example give that approach additional complexity.

 

Regular Expression seem to be the best way to extract the street names.

Super User
Posts: 3,110

Re: cleaning character columns

What is your final objective with cleaning address data? Is it by chance anything to do with address matching? If so there are tools and services available that cleanse, standardise and match addresses to a much higher level of quality than you are ever likely to achieve yourself.

 

Your addresses look like New Zealand ones. There are tools available with NZ address localisation that can do what you require without any coding, for example SAS's Dataflux. 

Regular Contributor
Posts: 194

Re: cleaning character columns

Hello,

 

If you have at your disposal a comprehensive list of possible street names, you can use it to match your list of adresses.

Occasional Contributor
Posts: 14

Re: cleaning character columns

 

How do I use a transwrd function for multiple conditions.

 

address = transwrd(upcase(addresses), 'ST', ' ');
address = transwrd(upcase(addresses), 'DR', ' ');

 

This code only takes the last entry. If I create multiple variables i.e. address1, address2 then I have to different varables which I need in 1 column.

 

Any help is appreciated

Thanks

 

Solution
Tuesday
Trusted Advisor
Posts: 1,389

Re: cleaning character columns

[ Edited ]

The function to replace a word is: TRANWRD (not transword).

 

To multiple replacements, you can do:

 

address = addresses;
address = tranwrd(upcase(address), ' ST', ' ');
address = tranwrd(upcase(address), ' DR', ' ');
address = compbl(address);
 

I have added a space before the 'ST', 'DR' - to eliminate replacement in case those are substrings 

(think of EASTERN, ANDRE)

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 166 views
  • 2 likes
  • 5 in conversation