<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Distinguishing an actual City value rather than a false positive in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323292#M71646</link>
    <description>&lt;P&gt;Thank you all for the responses and suggestions, i will try some of the solutions here today and other steps before hand to see what else we can do with this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's great to have the support and knowledge of people like yourselves out there, and greatly appreciated&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'll update with the solution chosen later.&lt;/P&gt;</description>
    <pubDate>Mon, 09 Jan 2017 09:02:07 GMT</pubDate>
    <dc:creator>MR_E</dc:creator>
    <dc:date>2017-01-09T09:02:07Z</dc:date>
    <item>
      <title>Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322975#M71524</link>
      <description>&lt;P&gt;Afternoon All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a situation where i need to remove false positives but keep the correct answers only, in this&amp;nbsp;similar example -&amp;nbsp;the City Hamburg.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;DATA&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; address_data;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;input&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; all_address_one_variable &lt;/FONT&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;$char38.&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;datalines&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;23 HAMBURGER ST&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (a)&lt;/P&gt;&lt;P&gt;HAMBURG Germany&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (b)&lt;/P&gt;&lt;P&gt;100 Hamburg Street, Netherlands&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (c)&lt;/P&gt;&lt;P&gt;666 schaufhassenstrasse, Hamburg 68801&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (d)&lt;/P&gt;&lt;P&gt;12 hamburg st, England&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(e)&lt;/P&gt;&lt;P&gt;147 Hamburg Road Transylvania&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (f)&lt;/P&gt;&lt;P&gt;180 Hamburg-Germany;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (g)&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;FONT size="3"&gt;The optimal output would be to show only b,d and g ONLY.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="3"&gt;So the Question is this - how can i select those only where it is the actual name Hamburg which actually relates to a city, not a street or road named Hamburg?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="3"&gt;I'm thinking a PRXMATCH or similar is required, but there may be other possibilities&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="3"&gt;Many thanks&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times" size="3"&gt;MR_E&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 14:26:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322975#M71524</guid>
      <dc:creator>MR_E</dc:creator>
      <dc:date>2017-01-06T14:26:18Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322980#M71525</link>
      <description>Prxmatch alone can't fix it, you still need some logic to distinguish from the other cases you have. By doing so, you would probably mimic the functionality of SAS Data Management Studio with its QKB's. So would look in that direction, especially if you have more similar tasks to solve.</description>
      <pubDate>Fri, 06 Jan 2017 14:16:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322980#M71525</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2017-01-06T14:16:36Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322986#M71526</link>
      <description>&lt;P&gt;Hi LinusH - thank you for the prompt response, however this is part of a large piece of logic already and will need a script to remove these instances and leave only the correct results- are there any other approaches / functions to hand that may assist with the prxmatch (or even replace)&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 14:33:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322986#M71526</guid>
      <dc:creator>MR_E</dc:creator>
      <dc:date>2017-01-06T14:33:00Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322990#M71527</link>
      <description>&lt;P&gt;There seems to be a large set of possible configurations and you have to figure it out first&lt;/P&gt;
&lt;P&gt;before designing the appropriate filter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The following program handles your example by using a regexp that matches all non wanted cases :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set address_data;
if not prxmatch("/[a-z]*hamburg([a-z]| st| ro| av)/i",all_address_one_variable);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 06 Jan 2017 14:44:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322990#M71527</guid>
      <dc:creator>gamotte</dc:creator>
      <dc:date>2017-01-06T14:44:17Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322998#M71529</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As it was already said, you'll have to identify cleary what are the rules.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking at your data example, the following would be appropriate:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data OUT;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set address_data;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if prxmatch('/[^a-z]*hamburg[^a-z]+(?!st|ro)+/i',&lt;SPAN&gt;all_address_one_variable&lt;/SPAN&gt;);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;&lt;SPAN&gt;If you need more info about regular expression check here: &lt;A href="http://www.regular-expressions.info/" target="_blank"&gt;http://www.regular-expressions.info/&lt;/A&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;&lt;SPAN&gt;Hope it helps.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;&lt;SPAN&gt;Daniel Santos&amp;nbsp;@ &lt;A href="http://www.cgd.pt" target="_blank"&gt;www.cgd.pt&lt;/A&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 15:15:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/322998#M71529</guid>
      <dc:creator>Daniel-Santos</dc:creator>
      <dc:date>2017-01-06T15:15:25Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323001#M71530</link>
      <description>&lt;P&gt;Does your exampe of picking the city HAMMBURG in GERMANY means that&lt;/P&gt;
&lt;P&gt;correct answer is picking &lt;U&gt;a specific given city&lt;/U&gt; in &lt;U&gt;a specific given country&lt;/U&gt; ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you trust addresses being in order format: home nomber - street name - city name - country name - zipcode&amp;nbsp;&lt;/P&gt;
&lt;P&gt;even if some parts are absent ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your example:&lt;/P&gt;
&lt;P&gt;- filter address if ST or STREET follow city_string (ignoring case)&lt;/P&gt;
&lt;P&gt;- accept address if country_string follows city_string&lt;/P&gt;
&lt;P&gt;- accept if numeric zip code follows city_string&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try next code, check results and add rules - either to filter or to accept - more rows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;%let city_string = HAMBURG;&lt;/P&gt;
&lt;P&gt;%let country_string = GERMANY;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data test;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;set have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; adr = upcase(address);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; if index(adr,'ST') &amp;gt; index(adr,"&amp;amp;city_string") then delete;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; if index(adr,"&amp;amp;country_string") &amp;gt;&amp;nbsp;&lt;SPAN&gt;index(adr,"&amp;amp;city_string") or&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;scan(adr,-1,' 0123456789') = "&amp;amp;city_string"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; then output;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 15:23:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323001#M71530</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-01-06T15:23:00Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323016#M71531</link>
      <description>&lt;P&gt;I would look at cleaning up/standardizing the addresses first. If you can parse out the components into the valid values then you can move on from there more easily.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 15:55:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323016#M71531</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-06T15:55:02Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323042#M71538</link>
      <description>&lt;P&gt;If this data has been collected since about 1980 then beatings with wet noodles are in order for whoever designed the data collection or data entry system. This problem has been around for a very long time and the basic fix is to collect data into separate fields at the start.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Rant over.&lt;/P&gt;
&lt;P&gt;We see this question so often that there are entirely too many people that are not doing any planning before they collect data thatt it is scary.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 18:55:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323042#M71538</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-01-06T18:55:23Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323046#M71541</link>
      <description>&lt;P&gt;A lot of time nowadays this could be for administrative data, that's collected for another purpose and then gets used for analysis when it was never intended to, or data that gets scraped off from different sources, such as text files.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Food for thought - the post office has been doing text recongition and address clean up for years, most of these algorithms recreate theirs. Too bad it's not public information since public dollars does the development.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 17:03:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323046#M71541</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-06T17:03:15Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323291#M71645</link>
      <description>&lt;P&gt;Ballardw - beatings with wet noodles sounds strange , yet painful - i agree, the department responsible for cleansing the data beforehand have not provided us with the optimal data lake, but i can sing and dance about that all day long.... trust me i've had a rant too!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When you say "collect data into seperate fields at the start", if we are talking about identifying cities from&amp;nbsp;a variable which contains a full address( or partial ) in one variable, and we want to determine if it is a city or a street, given all the scenarios possible, would seperating this help? i.e tokenisation?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 08:59:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323291#M71645</guid>
      <dc:creator>MR_E</dc:creator>
      <dc:date>2017-01-09T08:59:00Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323292#M71646</link>
      <description>&lt;P&gt;Thank you all for the responses and suggestions, i will try some of the solutions here today and other steps before hand to see what else we can do with this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's great to have the support and knowledge of people like yourselves out there, and greatly appreciated&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'll update with the solution chosen later.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 09:02:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323292#M71646</guid>
      <dc:creator>MR_E</dc:creator>
      <dc:date>2017-01-09T09:02:07Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323299#M71647</link>
      <description>&lt;P&gt;On second thought you better also check rejected rows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let city_string = HAMBURG;
%let country_string = GERMANY;
 
data &lt;STRONG&gt;accepted rejected;&lt;/STRONG&gt;
 set have;
      adr = upcase(address);
      if index(adr,'ST') &amp;gt; index(adr,"&amp;amp;city_string") 
         then&lt;STRONG&gt; do; output rejected; return;  end;
&lt;/STRONG&gt;
      if index(adr,"&amp;amp;country_string") &amp;gt; index(adr,"&amp;amp;city_string") or 
         scan(adr,-1,' 0123456789') = "&amp;amp;city_string"
      then &lt;STRONG&gt;output accepted&lt;/STRONG&gt;;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;you may find need move rows from rejected to accepted and vice versa.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 09:30:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323299#M71647</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-01-09T09:30:41Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323329#M71652</link>
      <description>&lt;P&gt;Yes, tokenization is what you're after and there are data quality tools out there which could do a lot of the job for you.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here what SAS has on offer: &lt;A href="https://support.sas.com/documentation/onlinedoc/dfdmstudio/" target="_blank"&gt;https://support.sas.com/documentation/onlinedoc/dfdmstudio/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 11:46:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323329#M71652</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-01-09T11:46:42Z</dc:date>
    </item>
    <item>
      <title>Re: Distinguishing an actual City value rather than a false positive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323414#M71678</link>
      <description>&lt;P&gt;If the data are entered/collected in a fixed order and there is something to delimit the field such as a comma similar to:&lt;/P&gt;
&lt;P&gt;street, city, state/province then at worst you can parse by order of appearance. If the data isn't kept to some order then you start getting into something like identify the "clean" formatted data and then looking for patterns in the remaining data to process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there is no order then things could get very interesting. But if the general order above is maintained but may have a "missing" item then some searches could be done such as "is the last item a statename" (start by looking&amp;nbsp;in the variables for something with&amp;nbsp;the lowest number of levels). If so then search to see if the next to last is a 1) City name and 2) in the state, remainder would be street bits. If the last is not a state then search for cities. If it isn't a city then fun insues.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have a postal code similar to the United States Zip code then the city and state/province could well be helpful in finding the bit that is a street address because our Zip will identify a city and state. Find those in the unstructed data and what is left is street.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2017 17:12:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Distinguishing-an-actual-City-value-rather-than-a-false-positive/m-p/323414#M71678</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-01-09T17:12:22Z</dc:date>
    </item>
  </channel>
</rss>

