BookmarkSubscribeRSS Feed
Roger_Federer_1
Calcite | Level 5

The address field in a table has two forms.  The first is:

123 Fake St Building 2 Suite 150

 

the second is

 

123 Fake St Paris France 24245

 

The parsing node doesn't seem to have a defintion that can handle these patterns.  All of the address definitions create a recipient token.  

 

What would be the best way to go about this?

7 REPLIES 7
SASKiwi
PROC Star

If the automatic parser doesn't handle it then you will need to add a pre-processing "manual" node to correct the address into a form that the parser can handle.

RonAgresta
SAS Employee

Hi,

 

If the output tokens from the parse definition you have chosen to use don't meet your needs, you can use the Customize component of Data Management Studio to adjust them. You can choose to add or delete tokens but doing so may require adjustments in the pattern logic mappings to the output tokens. Take a look at the documentation for Data Management Studio and search for "Managing and Customizing QKBs."

 

Ron 

Roger_Federer_1
Calcite | Level 5

Thanks for the replies.

I guess my biggest problem is that the only locale we have is English (US), and the bad addresses are mainly European.  I've added some additional vocabularies from tables of European cities and states and that has somewhat helped.  I was thinking I would just shoot for getting the city, state, and country tokens generated parsing definition, and then appendthat onto the existing table.  

Am I correct in assuming that dealing with street names and postal codes without the proper locales would be too difficult?

SASKiwi
PROC Star

In my limited experience with Dataflux, the best way to cleanup, standardise and match addresses is with the third-party address add-ons that SAS can provide for a particular country. The add-ons not only deal with the peculiarities of each country's addresses they can accurately identify postal delivery addresses down to delivery ID if your country of interest has these defined.

 

I recently did an exercise with New Zealand addresses and was able to accurately identify over 85% of the addresses right down to delivery ID. This is pretty impressive as there were a lot of non-postal addresses included. You can then use these IDs for accurate geocoding, mapping and other purposes as well as standardising your addresses to the local postal standard.

 

All this might sound like a bit of an overkill, but if you already have Dataflux, getting the country-specfic add-ons would not be much effort or expense.  

Roger_Federer_1
Calcite | Level 5

I wish that were a possibility, but we simply don't have enough international records to justify any expense.  

I wound up resorting to making a table of cases where the Dataflux generated city, state, or country token doesn't match our table, and cleaning it manually.  

SASKiwi
PROC Star

Sounds like a reasonable approach. I would only use a country-specific add-on only if the majority of addresses were for that country.

Amr_Mousa
Calcite | Level 5

i think you might have a workaround better than trying to get an add-ons, mainly you need some knowledge about QKB customization, then you need to put some validation inside your data job to  identify the european addresses like using country to find the countries names in the data(assumed based on the given example). Then you can profile those data to look for certain patterns and build a scheme to standardize those patterns. and then you can take a copy from the parsing address definiton and modify its schemes and vocablaries based on those new patterns to run against that data only, or you can you can use the standard output from the address parser and profile output data to get some patterns which will be used identify those patterns inside expression node using regular if else statements, and create new columns to set the values of the if-else decisions to those columns like for example:-

string country

if 'street name'=="France"

then country ='street name';

 

of course this is a very simple example you need to use some string functions inside expression language to set the condition

i hope this might help you

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2174 views
  • 0 likes
  • 4 in conversation