<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select Distinct Duplicate values in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362666#M23816</link>
    <description>&lt;P&gt;Names are often spelled differently, when entered more than once. &amp;nbsp;As a result, I suggest you go about this as a two-step problem. &amp;nbsp;First, find all the differences:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table differences as&lt;/P&gt;
&lt;P&gt;select a.*, b.customer_name as matching_name&lt;/P&gt;
&lt;P&gt;from have a, have b&lt;/P&gt;
&lt;P&gt;where a.mobile_number = b.mobile_number&lt;/P&gt;
&lt;P&gt;and a.customer_name ne&amp;nbsp;b.customer_name;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data differences2;&lt;/P&gt;
&lt;P&gt;set differences;&lt;/P&gt;
&lt;P&gt;if length(customer_name) le length(matching_name) then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;dummy = customer_name;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;customer_name = matching_name;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;matching_name = dummy;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;drop dummy;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In DIFFERENCES2, CUSTOMER_NAME and MATCHING_NAME will be different, but the DATA step guarantees that CUSTOMER_NAME will be the variation that contains more characters.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then you can begin to automate the process of removing matches that should be removed. &amp;nbsp;Due to variations in spelling,&amp;nbsp;you may need more than one pass through the data. &amp;nbsp;For example, does "Sam" match "Samuel"? &amp;nbsp;Should those be output? &amp;nbsp;Here is one pass through to eliminate the duplicates:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set differences2;&lt;/P&gt;
&lt;P&gt;if index(customer_name, strip(matching_name)) = 0;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instead of the INDEX function, you may want to use FINDW.&amp;nbsp; It really depends on the nature of the data, as to which function gives you better results.&lt;/P&gt;</description>
    <pubDate>Tue, 30 May 2017 10:46:43 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2017-05-30T10:46:43Z</dc:date>
    <item>
      <title>Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362653#M23811</link>
      <description>&lt;P&gt;hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i need help. I have a dataset, sample below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CUSTOMER_NAME &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; MOBILE_NUMBER&lt;/P&gt;&lt;P&gt;John &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;555-777&lt;/P&gt;&lt;P&gt;Sam &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 556-865&lt;/P&gt;&lt;P&gt;Joe &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;555-000&lt;/P&gt;&lt;P&gt;Sam+Joe &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;556-865&lt;/P&gt;&lt;P&gt;Tim &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 555-777&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;want to output customers with same mobile number but different names. in the data above we can see that Sam and Sam+Joe have the same number, this means Sam is Joe's partner so they can have the same number. but John and Tim are not connected and have the same number, those are the guys i want to output&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks in advance&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2017 09:11:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362653#M23811</guid>
      <dc:creator>umar_milanzi</dc:creator>
      <dc:date>2017-05-30T09:11:07Z</dc:date>
    </item>
    <item>
      <title>Re: Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362657#M23812</link>
      <description>&lt;P&gt;Post test data in the form of a datastep.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Maybe something like (and can't test as not test data):&lt;/P&gt;
&lt;PRE&gt;proc sql;
  create table WANT as
  select  distinct
          MOBILE_NUMBER,
          CUSTOMER_NAME
  from    HAVE
  group by MOBILE_NUMBER,
           CUSTOMER_NAME
  having count(*) gt 1;
quit;
  &lt;/PRE&gt;</description>
      <pubDate>Tue, 30 May 2017 09:44:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362657#M23812</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-05-30T09:44:27Z</dc:date>
    </item>
    <item>
      <title>Re: Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362659#M23813</link>
      <description>&lt;P&gt;&lt;BR /&gt;data have;&lt;BR /&gt;input customer_name $ Mobile_Number ;&lt;BR /&gt;&lt;BR /&gt;datalines;&lt;BR /&gt;john 555777&lt;BR /&gt;sam 556865&lt;BR /&gt;joe 555888&lt;BR /&gt;sam+joe 556865&lt;BR /&gt;tim 555777&lt;BR /&gt;; run;&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2017 10:03:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362659#M23813</guid>
      <dc:creator>umar_milanzi</dc:creator>
      <dc:date>2017-05-30T10:03:10Z</dc:date>
    </item>
    <item>
      <title>Re: Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362661#M23814</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;but John and Tim are not connected and have the same number, those are the guys i want to output"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You mean you do not want to output ?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2017 10:21:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362661#M23814</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-05-30T10:21:11Z</dc:date>
    </item>
    <item>
      <title>Re: Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362666#M23816</link>
      <description>&lt;P&gt;Names are often spelled differently, when entered more than once. &amp;nbsp;As a result, I suggest you go about this as a two-step problem. &amp;nbsp;First, find all the differences:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table differences as&lt;/P&gt;
&lt;P&gt;select a.*, b.customer_name as matching_name&lt;/P&gt;
&lt;P&gt;from have a, have b&lt;/P&gt;
&lt;P&gt;where a.mobile_number = b.mobile_number&lt;/P&gt;
&lt;P&gt;and a.customer_name ne&amp;nbsp;b.customer_name;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data differences2;&lt;/P&gt;
&lt;P&gt;set differences;&lt;/P&gt;
&lt;P&gt;if length(customer_name) le length(matching_name) then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;dummy = customer_name;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;customer_name = matching_name;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;matching_name = dummy;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;drop dummy;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In DIFFERENCES2, CUSTOMER_NAME and MATCHING_NAME will be different, but the DATA step guarantees that CUSTOMER_NAME will be the variation that contains more characters.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then you can begin to automate the process of removing matches that should be removed. &amp;nbsp;Due to variations in spelling,&amp;nbsp;you may need more than one pass through the data. &amp;nbsp;For example, does "Sam" match "Samuel"? &amp;nbsp;Should those be output? &amp;nbsp;Here is one pass through to eliminate the duplicates:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set differences2;&lt;/P&gt;
&lt;P&gt;if index(customer_name, strip(matching_name)) = 0;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instead of the INDEX function, you may want to use FINDW.&amp;nbsp; It really depends on the nature of the data, as to which function gives you better results.&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2017 10:46:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362666#M23816</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2017-05-30T10:46:43Z</dc:date>
    </item>
    <item>
      <title>Re: Select Distinct Duplicate values</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362667#M23817</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;this looks like it is doing what i want.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks everyone for the inputs.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2017 10:49:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Select-Distinct-Duplicate-values/m-p/362667#M23817</guid>
      <dc:creator>umar_milanzi</dc:creator>
      <dc:date>2017-05-30T10:49:09Z</dc:date>
    </item>
  </channel>
</rss>

