<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merging/Joining Datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384044#M91641</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/148988"&gt;@takumamih&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;On top of what&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;wrote: If you've got the Data Quality Server licensed then you can also create match codes using dqmatch() and then join over these match codes.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#p09nffezbjyj4on11oblz77aq1x6.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#p09nffezbjyj4on11oblz77aq1x6.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Based on the data you've posted you would still need to implement some data prep logic before feeding into dqmatch() as the strings look too different to each other as that I'd expect dqmatch() to return the same match codes for values you'd like to get matched.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What using dqmatch() would solve for you:&lt;/P&gt;
&lt;P&gt;1. You're creating first these match codes on row level (that's the fuzzy part) but then join via an exact match over these match codes and though don't need anymore compare everything with everything.&lt;/P&gt;
&lt;P&gt;2. You take advantage of a knowledge base (QKB) which comes already with a lot of pre-defined rules so you don't need to start from scratch.&lt;/P&gt;</description>
    <pubDate>Sun, 30 Jul 2017 01:27:13 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2017-07-30T01:27:13Z</dc:date>
    <item>
      <title>Merging/Joining Datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384037#M91637</link>
      <description>&lt;P&gt;I am trying to join two datasets, but the common variable is not exactly identical. Would I have to adjust the names of one of the datasets in order to successfully merge?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is potentially a problem because I am trying to match the lat_lon_sort dataset with several different datasets that look similar to the O3 dataset, but with slightly different variable names.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;lat_lon_sort:&lt;/P&gt;&lt;P&gt;Abbotsford_A_Columbia_Street&lt;/P&gt;&lt;P&gt;Abbotsford_Central&lt;/P&gt;&lt;P&gt;Agassiz_municipal_Hall&lt;/P&gt;&lt;P&gt;Burnaby_Burmount&lt;/P&gt;&lt;P&gt;Burnaby_Kensington_Park&lt;/P&gt;&lt;P&gt;Burnaby_Mountain&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;O3:&lt;/P&gt;&lt;P&gt;Abfd_A_Col_St_O3&lt;/P&gt;&lt;P&gt;Abfd_A_O3&lt;/P&gt;&lt;P&gt;Abfd_Central_O3&lt;/P&gt;&lt;P&gt;Aggsz_Mun_Hall_O3&lt;/P&gt;&lt;P&gt;Burnaby_Kens_Park_O3&lt;/P&gt;&lt;P&gt;Burnaby_Mountain_O3&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jul 2017 23:12:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384037#M91637</guid>
      <dc:creator>takumamih</dc:creator>
      <dc:date>2017-07-29T23:12:05Z</dc:date>
    </item>
    <item>
      <title>Re: Merging/Joining Datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384038#M91638</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/148988"&gt;@takumamih&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;I am trying to join two datasets, but the common variable is not exactly identical. Would I have to adjust the names of one of the datasets in order to successfully merge?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Yes. This is known as fuzzy matching and is a very difficult issue to solve because it's hard to identify the matches. You first need to consider how accurate the reuslts need to be before you can determine what approach to take.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;If close enough is ok, you can look at using the COMPGED/COMPLEV or similar functions to match. They provide a 'distance' measure for character variables so you can see how similar they are when comparing. You also need to consider timing, because its not an exact match you need to now compare each value to every other value which can really really be problematic for big datasets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jul 2017 23:23:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384038#M91638</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-07-29T23:23:47Z</dc:date>
    </item>
    <item>
      <title>Re: Merging/Joining Datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384044#M91641</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/148988"&gt;@takumamih&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;On top of what&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;wrote: If you've got the Data Quality Server licensed then you can also create match codes using dqmatch() and then join over these match codes.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#p09nffezbjyj4on11oblz77aq1x6.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/dqclref/70016/HTML/default/viewer.htm#p09nffezbjyj4on11oblz77aq1x6.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Based on the data you've posted you would still need to implement some data prep logic before feeding into dqmatch() as the strings look too different to each other as that I'd expect dqmatch() to return the same match codes for values you'd like to get matched.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What using dqmatch() would solve for you:&lt;/P&gt;
&lt;P&gt;1. You're creating first these match codes on row level (that's the fuzzy part) but then join via an exact match over these match codes and though don't need anymore compare everything with everything.&lt;/P&gt;
&lt;P&gt;2. You take advantage of a knowledge base (QKB) which comes already with a lot of pre-defined rules so you don't need to start from scratch.&lt;/P&gt;</description>
      <pubDate>Sun, 30 Jul 2017 01:27:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-Joining-Datasets/m-p/384044#M91641</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-07-30T01:27:13Z</dc:date>
    </item>
  </channel>
</rss>

