01-29-2016 08:10 AM - edited 01-29-2016 08:15 AM
The address field in a table has two forms. The first is:
123 Fake St Building 2 Suite 150
the second is
123 Fake St Paris France 24245
The parsing node doesn't seem to have a defintion that can handle these patterns. All of the address definitions create a recipient token.
What would be the best way to go about this?
02-24-2016 06:22 PM
If the automatic parser doesn't handle it then you will need to add a pre-processing "manual" node to correct the address into a form that the parser can handle.
02-25-2016 10:17 AM
If the output tokens from the parse definition you have chosen to use don't meet your needs, you can use the Customize component of Data Management Studio to adjust them. You can choose to add or delete tokens but doing so may require adjustments in the pattern logic mappings to the output tokens. Take a look at the documentation for Data Management Studio and search for "Managing and Customizing QKBs."
03-03-2016 11:02 AM
Thanks for the replies.
I guess my biggest problem is that the only locale we have is English (US), and the bad addresses are mainly European. I've added some additional vocabularies from tables of European cities and states and that has somewhat helped. I was thinking I would just shoot for getting the city, state, and country tokens generated parsing definition, and then appendthat onto the existing table.
Am I correct in assuming that dealing with street names and postal codes without the proper locales would be too difficult?
03-04-2016 06:39 PM - edited 03-06-2016 01:34 AM
In my limited experience with Dataflux, the best way to cleanup, standardise and match addresses is with the third-party address add-ons that SAS can provide for a particular country. The add-ons not only deal with the peculiarities of each country's addresses they can accurately identify postal delivery addresses down to delivery ID if your country of interest has these defined.
I recently did an exercise with New Zealand addresses and was able to accurately identify over 85% of the addresses right down to delivery ID. This is pretty impressive as there were a lot of non-postal addresses included. You can then use these IDs for accurate geocoding, mapping and other purposes as well as standardising your addresses to the local postal standard.
All this might sound like a bit of an overkill, but if you already have Dataflux, getting the country-specfic add-ons would not be much effort or expense.
03-07-2016 04:17 PM
I wish that were a possibility, but we simply don't have enough international records to justify any expense.
I wound up resorting to making a table of cases where the Dataflux generated city, state, or country token doesn't match our table, and cleaning it manually.
03-07-2016 09:55 PM
Sounds like a reasonable approach. I would only use a country-specific add-on only if the majority of addresses were for that country.
03-27-2016 04:44 PM
i think you might have a workaround better than trying to get an add-ons, mainly you need some knowledge about QKB customization, then you need to put some validation inside your data job to identify the european addresses like using country to find the countries names in the data(assumed based on the given example). Then you can profile those data to look for certain patterns and build a scheme to standardize those patterns. and then you can take a copy from the parsing address definiton and modify its schemes and vocablaries based on those new patterns to run against that data only, or you can you can use the standard output from the address parser and profile output data to get some patterns which will be used identify those patterns inside expression node using regular if else statements, and create new columns to set the values of the if-else decisions to those columns like for example:-
if 'street name'=="France"
then country ='street name';
of course this is a very simple example you need to use some string functions inside expression language to set the condition
i hope this might help you