SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Parsing Address field in Dataflux Studio

Reply
New Contributor
Posts: 3

Parsing Address field in Dataflux Studio

[ Edited ]

The address field in a table has two forms.  The first is:

123 Fake St Building 2 Suite 150

 

the second is

 

123 Fake St Paris France 24245

 

The parsing node doesn't seem to have a defintion that can handle these patterns.  All of the address definitions create a recipient token.  

 

What would be the best way to go about this?

Super User
Posts: 3,101

Re: Parsing Address field in Dataflux Studio

If the automatic parser doesn't handle it then you will need to add a pre-processing "manual" node to correct the address into a form that the parser can handle.

SAS Super FREQ
Posts: 90

Re: Parsing Address field in Dataflux Studio

Hi,

 

If the output tokens from the parse definition you have chosen to use don't meet your needs, you can use the Customize component of Data Management Studio to adjust them. You can choose to add or delete tokens but doing so may require adjustments in the pattern logic mappings to the output tokens. Take a look at the documentation for Data Management Studio and search for "Managing and Customizing QKBs."

 

Ron 

New Contributor
Posts: 3

Re: Parsing Address field in Dataflux Studio

Thanks for the replies.

I guess my biggest problem is that the only locale we have is English (US), and the bad addresses are mainly European.  I've added some additional vocabularies from tables of European cities and states and that has somewhat helped.  I was thinking I would just shoot for getting the city, state, and country tokens generated parsing definition, and then appendthat onto the existing table.  

Am I correct in assuming that dealing with street names and postal codes without the proper locales would be too difficult?

Super User
Posts: 3,101

Re: Parsing Address field in Dataflux Studio

[ Edited ]

In my limited experience with Dataflux, the best way to cleanup, standardise and match addresses is with the third-party address add-ons that SAS can provide for a particular country. The add-ons not only deal with the peculiarities of each country's addresses they can accurately identify postal delivery addresses down to delivery ID if your country of interest has these defined.

 

I recently did an exercise with New Zealand addresses and was able to accurately identify over 85% of the addresses right down to delivery ID. This is pretty impressive as there were a lot of non-postal addresses included. You can then use these IDs for accurate geocoding, mapping and other purposes as well as standardising your addresses to the local postal standard.

 

All this might sound like a bit of an overkill, but if you already have Dataflux, getting the country-specfic add-ons would not be much effort or expense.  

New Contributor
Posts: 3

Re: Parsing Address field in Dataflux Studio

I wish that were a possibility, but we simply don't have enough international records to justify any expense.  

I wound up resorting to making a table of cases where the Dataflux generated city, state, or country token doesn't match our table, and cleaning it manually.  

Super User
Posts: 3,101

Re: Parsing Address field in Dataflux Studio

Sounds like a reasonable approach. I would only use a country-specific add-on only if the majority of addresses were for that country.

New Contributor
Posts: 3

Re: Parsing Address field in Dataflux Studio

i think you might have a workaround better than trying to get an add-ons, mainly you need some knowledge about QKB customization, then you need to put some validation inside your data job to  identify the european addresses like using country to find the countries names in the data(assumed based on the given example). Then you can profile those data to look for certain patterns and build a scheme to standardize those patterns. and then you can take a copy from the parsing address definiton and modify its schemes and vocablaries based on those new patterns to run against that data only, or you can you can use the standard output from the address parser and profile output data to get some patterns which will be used identify those patterns inside expression node using regular if else statements, and create new columns to set the values of the if-else decisions to those columns like for example:-

string country

if 'street name'=="France"

then country ='street name';

 

of course this is a very simple example you need to use some string functions inside expression language to set the condition

i hope this might help you

Ask a Question
Discussion stats
  • 7 replies
  • 572 views
  • 0 likes
  • 4 in conversation