<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to isolate and select the specific string matching on the string int he second reference dat in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645483#M192976</link>
    <description>&lt;P&gt;This kind of work is rather iterative:&lt;/P&gt;
&lt;P&gt;You match data, then look at what hasn't matched, then discover new words or new defects or new patterns, match again, etc.&lt;/P&gt;</description>
    <pubDate>Tue, 05 May 2020 23:59:40 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-05-05T23:59:40Z</dc:date>
    <item>
      <title>How to isolate and select the specific string matching on the string int he second reference dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645370#M192912</link>
      <description>&lt;P&gt;Hi Folks:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have about 20 survey tables that needed to be aligned to a REFERENCE data. ID1NAME and IDNAME are keys to merge them. They are province(ID1NAME) and city(IDNAME) names. City names are not unique but combination of ID1NAME and IDNAME are. ID1NAME is cleanable since at least there is a logic. However, IDNAME in survey tables are messy and no logic, to me, at least. Province name is included in the city name at front OR at the end of the variable varying across rows. Explanatory but non-standardized texts appear like: integrated, combined or altogether just to name few. There can be 1-4 open spaces as a delimiter varying across rows. I see the way to standardize IDNAME in survey table probably is to create separate each string separated by space into different variables and use array. Then find if IDNAME of the REFERENCE data in the array of these newly created variables. For example, the IDNAME of third row would lead to three variables such as: var1&amp;nbsp;&lt;CODE class=" language-sas"&gt;for Integrated, var2 for Cheongju and var3 for City.&amp;nbsp;Which is the&amp;nbsp;array of&amp;nbsp;var1-var3.&amp;nbsp;&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;If IDNAME of REFERENCE data is found in this array of var1-var3 then output as a clean desired IDNAME as a new variable in survey datasets.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you please help execute this idea in SAS if this makes sense? Below is mock datasets. But I couldn't figure how to input strings separated by space(s) as one variable. I apologize!&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA SURVEY;
INPUT ID1NAME $25 IDNAME $25; 
CARDS;
Chung sub Total 
Chung Chungju Chung
Chung Jecheon
Chung Integrated Cheongju City
Chung Boeun-gun
Chung Okcheon-gun
Chang Yeongdong-gun
Chang Chang Jincheon-gun
Chang Chang Goesan-gun 
Chang Combined Eumseong City 
Chang Danyang-gun 
Chang Jeungpyeong
;
DATA REFERENCE; 
INPUT ID1NAME $25 IDNAME $25; 
CARDS;
Chung Chungju
Chung Jecheon
Chung Cheongju
Chung Boeun
Chung Okcheon
Chang Yeongdong
Chang Jincheon
Chang Goesan
Chang Eumseong
Chang Danyang
Chang Jeungpyeong
; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 18:42:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645370#M192912</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T18:42:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645445#M192945</link>
      <description>&lt;P&gt;PRXMATCH is one way to do this.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;proc sql ;
    create table match as
       select r.id1name, r.idname, s.idname as messyname 
          from reference as r
          left join survey as s
          on r.id1name = s.id1name 
          where prxmatch("/"||strip(r.idname)||"/",s.idname)&amp;gt;0;
quit ;&lt;/PRE&gt;</description>
      <pubDate>Tue, 05 May 2020 22:43:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645445#M192945</guid>
      <dc:creator>biopharma</dc:creator>
      <dc:date>2020-05-05T22:43:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645476#M192969</link>
      <description>&lt;P&gt;I would separate the values so that you have:&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;Chang Combined Eumseong City&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;going to:&lt;/P&gt;
&lt;P&gt;PROVINCE=Chang&lt;/P&gt;
&lt;P&gt;CITY=Eumseong&lt;/P&gt;
&lt;P&gt;EXPL=Combined City&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can then focus on matching the City names.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PS:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;&amp;nbsp;I couldn't figure how to input strings separated by space(s)&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Use the &amp;amp;. See&amp;nbsp;&lt;A href="https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/253-29.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/253-29.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 23:45:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645476#M192969</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-05T23:45:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645479#M192972</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you so much. I was just thinking of no one would respond to this post.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The problem is that for this particular row '&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;Eumseong&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;is the city name in the third column location. But, for another row the city name appear in the second column. Which means that I don't know which column matched to the city in the reference data, i.e., whether which one of Chang, Yeumseong or Combined City were the name for a city. And I have 20 survey tables which makes manual review almost impossible.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 23:53:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645479#M192972</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T23:53:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645480#M192973</link>
      <description>&lt;P&gt;My point was to remove noise words such as &lt;EM&gt;Combined&lt;/EM&gt; into a third variable.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 23:54:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645480#M192973</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-05T23:54:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645482#M192975</link>
      <description>Noise words also happen at the beginning of the spring. Such as Integrated, Combined, Sub total et.c.</description>
      <pubDate>Tue, 05 May 2020 23:56:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645482#M192975</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T23:56:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645483#M192976</link>
      <description>&lt;P&gt;This kind of work is rather iterative:&lt;/P&gt;
&lt;P&gt;You match data, then look at what hasn't matched, then discover new words or new defects or new patterns, match again, etc.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 23:59:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645483#M192976</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-05T23:59:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645484#M192977</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt;Noise words also happen at the beginning of the spring.&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Their location does not matter. They are known words, so easy to spot.&lt;/P&gt;
&lt;P&gt;*Where they appear* may be used as separator however, maybe between province and city. So when you remove them is also a good time to look at that.&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2020 00:02:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645484#M192977</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-06T00:02:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645485#M192978</link>
      <description>This is exactly what I'm doing right now. I wondered if there was more automated way of doing this. I'm working on the mismatched data one on one basis for each mis-matched row. This work involves Google Translate as well.</description>
      <pubDate>Wed, 06 May 2020 00:05:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645485#M192978</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-06T00:05:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645490#M192982</link>
      <description>&lt;P&gt;Hopefully you needn't work on a row-by-row basis. I'd hope many rows have similar patterns, and once you code for a pattern all the concerned rows are processed adequately.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2020 00:51:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645490#M192982</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-06T00:51:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to isolate and select the specific string matching on the string int he second reference dat</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645497#M192987</link>
      <description>Thanks for reassuring that all my efforts are at least not an unnecessary iterative work. This is very important to know. Thanks Chris.</description>
      <pubDate>Wed, 06 May 2020 01:51:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-isolate-and-select-the-specific-string-matching-on-the/m-p/645497#M192987</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-06T01:51:25Z</dc:date>
    </item>
  </channel>
</rss>

