<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merging two datasets with no obvious identifier/s in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746655#M234250</link>
    <description>&lt;P&gt;I've found using multiple matching strategies is a good approach. Start by using exact matches between ALL common variables, although I'd uppercase all of the character variables first. Then I'd look at what is left and then match on first name, surname, DOB and sex and see how that works out, then maybe surname, DOB and sex. You then choose the match with the highest accuracy - most variables matching. Leave fuzzy matches to last and only apply them to the remaining unmatched data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By chipping away at the problem starting with the most accurate matching strategy and working your way down to the least accurate you can often make surprising progress.&lt;/P&gt;</description>
    <pubDate>Wed, 09 Jun 2021 05:24:40 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2021-06-09T05:24:40Z</dc:date>
    <item>
      <title>Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746641#M234239</link>
      <description>&lt;P&gt;I have two datasets that I am working with that need to be merged.&amp;nbsp; The potential problem is that neither dataset has what I would call an obvious choice for a sort / by variable.&amp;nbsp; These are a REALD data set of individuals who are identified&amp;nbsp; by REALD measures and Medicaid program data set (like TANF and SNAP) I will only give a fictionalized couple of rows from each:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Orpheus dataset (data set with respondents to disability questions on&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; survey)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;FirstNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LastNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DOB&amp;nbsp; &amp;nbsp; &amp;nbsp; Sex&amp;nbsp;&amp;nbsp;&amp;nbsp; MiddleNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; City&lt;/P&gt;
&lt;P&gt;James&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Eastwood&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 09/19/75&amp;nbsp;&amp;nbsp;&amp;nbsp; M&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; A&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Medford&lt;/P&gt;
&lt;P&gt;Alishay&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Connell&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 03/06/91&amp;nbsp;&amp;nbsp;&amp;nbsp; F&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Eugene&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ONE dataset (Medicaid program)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;FirstNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LastNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DOB&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; Sex&amp;nbsp;&amp;nbsp;&amp;nbsp; MiddleNm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; City&lt;/P&gt;
&lt;P&gt;Daniel&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hart&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 07/18/80&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; M&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Patrick&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Baker City&lt;/P&gt;
&lt;P&gt;Jade&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 01/02/88&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; F&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Portland&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are missing values (like LastNm) in the second field above.&amp;nbsp; Also the raw some of the raw data has extraneous entries like commas and&lt;/P&gt;
&lt;P&gt;single and double quotes especially around names.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have looked over some SAS articles LexJansen.com but not quite what I was hoping to find.&amp;nbsp; What I'd like to find is a straightforward way to merge these two datasets that have some problem data entries and missing values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What would be the best method to achieve a merge of these datasets?&lt;/P&gt;
&lt;P&gt;Concatenating two fields then sorting and using the concatenated variable to perform the merge?&amp;nbsp; or Adding some type of indicator variable to each dataset to serve the sorting and merging by requirements?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your thoughts and help are much valued, thank you.&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 02:17:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746641#M234239</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-09T02:17:44Z</dc:date>
    </item>
    <item>
      <title>Re: Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746644#M234241</link>
      <description>&lt;P&gt;I've done a little of this type of "fuzzy" matching where you don't have a "hard" identifier like SSN or HIC number using the COMPGED function.&amp;nbsp; You kind of have to experiment and do some trial and error.&amp;nbsp; No matter what you do, there are going to be some false matches and also some that should be matched but are not.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This paper might be helpful:&amp;nbsp;&amp;nbsp;&lt;A href="https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/" target="_blank"&gt;https://blogs.sas.com/content/sgf/2015/01/27/how-to-perform-a-fuzzy-match-using-sas-functions/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 02:35:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746644#M234241</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2021-06-09T02:35:50Z</dc:date>
    </item>
    <item>
      <title>Re: Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746653#M234248</link>
      <description>&lt;P&gt;You may get more answers, if you&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;post data in usable form (dataset with datalines, no attachments)&lt;/LI&gt;
&lt;LI&gt;and show what you expect as result&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 09 Jun 2021 05:14:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746653#M234248</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-06-09T05:14:46Z</dc:date>
    </item>
    <item>
      <title>Re: Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746655#M234250</link>
      <description>&lt;P&gt;I've found using multiple matching strategies is a good approach. Start by using exact matches between ALL common variables, although I'd uppercase all of the character variables first. Then I'd look at what is left and then match on first name, surname, DOB and sex and see how that works out, then maybe surname, DOB and sex. You then choose the match with the highest accuracy - most variables matching. Leave fuzzy matches to last and only apply them to the remaining unmatched data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By chipping away at the problem starting with the most accurate matching strategy and working your way down to the least accurate you can often make surprising progress.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 05:24:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746655#M234250</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2021-06-09T05:24:40Z</dc:date>
    </item>
    <item>
      <title>Re: Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746657#M234252</link>
      <description>&lt;P&gt;I really like the iterative approach.&amp;nbsp; That sounds like it could provide a very complete approach - one that I can share with the researchers who eventually will use the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 05:40:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746657#M234252</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-09T05:40:22Z</dc:date>
    </item>
    <item>
      <title>Re: Merging two datasets with no obvious identifier/s</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746658#M234253</link>
      <description>&lt;P&gt;Thank you for the link to the paper - looks to be helpful.&lt;/P&gt;
&lt;P&gt;Also I appreciate your iterative approach to exploring this challenge. That iterative approach is very closely related to the response from SASkiwi.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your help. I really appreciate it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 05:49:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merging-two-datasets-with-no-obvious-identifier-s/m-p/746658#M234253</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-09T05:49:00Z</dc:date>
    </item>
  </channel>
</rss>

