<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicate/Close Match values across records in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316115#M69033</link>
    <description>&lt;P&gt;Here is one very simplistic approach to see what might be done. Note that if you have a large data set this can take time. If there are duplicate values that may be in the data you likely should reduce to one record of each value. Also you will get A &amp;lt;-&amp;gt; B and B &amp;lt;-&amp;gt; result pairs. An exercise for the interested reader is reducing those to one side only.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
   input ClientID $;
datalines;
ABC123
ABC-123
DEF123
DEF-123
;
run;

proc sql;
   create table spelldist as
   select a.clientid, b.clientid as OtherVal,
          compged(a.clientid,b.clientid) as Compdist,
          spedis(a.clientid,b.clientid) as spedist
   from have as a, have as b
   where a.clientid ne b.clientid
   ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 01 Dec 2016 23:26:07 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2016-12-01T23:26:07Z</dc:date>
    <item>
      <title>Duplicate/Close Match values across records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316041#M69002</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;Would someone have ideas on how to determine if there are similar looking values for a variable across records?&lt;/P&gt;
&lt;P&gt;For example, in the following sample dataset (one variable), is there a series of steps/ a function to output if there is a similar looking client ID in the rest of the dataset and, if so, what it is?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;Client ID&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;
&lt;P&gt;ABC123&lt;/P&gt;
&lt;P&gt;ABC-123&lt;/P&gt;
&lt;P&gt;DEF123&lt;/P&gt;
&lt;P&gt;DEF-123&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please let me know if I can be clearer. Thank you!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 18:47:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316041#M69002</guid>
      <dc:creator>Maisha_Huq</dc:creator>
      <dc:date>2016-12-01T18:47:27Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate/Close Match values across records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316057#M69013</link>
      <description>&lt;P&gt;It depends what the issue really is. If added characters are the problem, then you simply need to remove them (look at the COMPRESS function). If the problem is spelling variations then look at spelling distance functions (SPEDIS, COMPLEV, COMPGED). For a phonetic-based comparison, look at SOUNDEX.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 19:25:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316057#M69013</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-12-01T19:25:46Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate/Close Match values across records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316058#M69014</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Use Compress function for to have same syntax in Client Id records, after this output the similar/same id is easily.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data TableName;&lt;/P&gt;&lt;P&gt;set TableName;&lt;/P&gt;&lt;P&gt;'Client Id'n = Compress('Client Id'n, '-'); /*remove '-' from Client id*/&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 19:29:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316058#M69014</guid>
      <dc:creator>Flexron</dc:creator>
      <dc:date>2016-12-01T19:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate/Close Match values across records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316079#M69023</link>
      <description>&lt;P&gt;Thank you, PG Stats.&amp;nbsp; But wouldn't these functions such as the spelling distance functions help me determine the spelling distance between two observations within one record instead of determining the spelling distance&amp;nbsp;between two observations across records but within one variable?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2016 21:08:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316079#M69023</guid>
      <dc:creator>Maisha_Huq</dc:creator>
      <dc:date>2016-12-01T21:08:14Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicate/Close Match values across records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316115#M69033</link>
      <description>&lt;P&gt;Here is one very simplistic approach to see what might be done. Note that if you have a large data set this can take time. If there are duplicate values that may be in the data you likely should reduce to one record of each value. Also you will get A &amp;lt;-&amp;gt; B and B &amp;lt;-&amp;gt; result pairs. An exercise for the interested reader is reducing those to one side only.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
   input ClientID $;
datalines;
ABC123
ABC-123
DEF123
DEF-123
;
run;

proc sql;
   create table spelldist as
   select a.clientid, b.clientid as OtherVal,
          compged(a.clientid,b.clientid) as Compdist,
          spedis(a.clientid,b.clientid) as spedist
   from have as a, have as b
   where a.clientid ne b.clientid
   ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 01 Dec 2016 23:26:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicate-Close-Match-values-across-records/m-p/316115#M69033</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-12-01T23:26:07Z</dc:date>
    </item>
  </channel>
</rss>

