Without having some kind of well-known rule for the contents of the address variable, this is nearly impossible to accomplish. Parsing addresses from scratch is more of an art than a science. E.g., how to know when the city is two or more words? You need data from outside the application to inform the effort. This is why address correction software is as complex as it is. Also, it is very possible that, for some localities, there are multiple valid city/town names - in such cases it not that the provided locality is wrong, rather that the address correction software used by the client (if they actually do use such) has returned a different but equally valid city name and used that for the locality field they provide. It is also possible that a zip code straddle state lines, so be careful. However, to 'fix' the locality to have the exact same value as contained in the address, the address parsing needed can become quite complex. One would likely need to locate and separate out the city, state and zip code (US addresses). Then use a zip/city table to see the known cities/towns for the zip. (SAS actually provides a zip table.) When one of those city/town names is contained within the original address, copy it from the zip table into the (now updated) locality field. This assumes the zip code is correct. But you still have the problem that both the address and the locality are wrong, but do not match each other.
... View more