Hi everyone,
I have a long list of addresses and I need to extract just the street name:
Dummy data set:
Addresses (column name)
1000 Ngapenga rd
25 Gill Lane
po box 234
174/H Mangatin drive
102b te hono st
Te pahu rd
162 No 2 rd
I want the extract street name to look like
Street_name:
Ngapenga
Gill
Mangatin
Te Hono
Tepahu
No 2
My code is currently below:
set customer_addy;
x = anydigit(addresses,1);
if x = 1 then street_name = substr(addresses,2,length(scan(addresses,2, ' ')));
run;
I cannot get my head around how to taken into account all the many conditions. Any help is appreciated.
Thanks
The function to replace a word is: TRANWRD (not transword).
To multiple replacements, you can do:
address = addresses;
address = tranwrd(upcase(address), ' ST', ' ');
address = tranwrd(upcase(address), ' DR', ' ');
address = compbl(address);
I have added a space before the 'ST', 'DR' - to eliminate replacement in case those are substrings
(think of EASTERN, ANDRE)
You may try use translate in order to replace numers into space, and
use tranword to replace constants - like ' rd ', ' st ', ' road ', ' street ', ' lane ', etc. - into spaces,
being aware of lowcase/uppercase, than use compbl the result and check
is ther more to do.
Write down every rule you want to apply to the variable Addresses, then start coding.
Maybe deleting the unwanted content is easier than extracting the required information, the last line of your example give that approach additional complexity.
Regular Expression seem to be the best way to extract the street names.
What is your final objective with cleaning address data? Is it by chance anything to do with address matching? If so there are tools and services available that cleanse, standardise and match addresses to a much higher level of quality than you are ever likely to achieve yourself.
Your addresses look like New Zealand ones. There are tools available with NZ address localisation that can do what you require without any coding, for example SAS's Dataflux.
Hello,
If you have at your disposal a comprehensive list of possible street names, you can use it to match your list of adresses.
How do I use a transwrd function for multiple conditions.
address = transwrd(upcase(addresses), 'ST', ' ');
address = transwrd(upcase(addresses), 'DR', ' ');
This code only takes the last entry. If I create multiple variables i.e. address1, address2 then I have to different varables which I need in 1 column.
Any help is appreciated
Thanks
The function to replace a word is: TRANWRD (not transword).
To multiple replacements, you can do:
address = addresses;
address = tranwrd(upcase(address), ' ST', ' ');
address = tranwrd(upcase(address), ' DR', ' ');
address = compbl(address);
I have added a space before the 'ST', 'DR' - to eliminate replacement in case those are substrings
(think of EASTERN, ANDRE)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.