<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values. in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592838#M170032</link>
    <description>Well, when you have duplicates it brings in all the duplicates. Perhaps you need to add another condition when joining that table? Or is there a specific record that makes sense to bring in?</description>
    <pubDate>Mon, 30 Sep 2019 21:44:55 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2019-09-30T21:44:55Z</dc:date>
    <item>
      <title>Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592831#M170027</link>
      <description>&lt;P&gt;I am trying to join 6 datasets and using proc sql and the obs are supposed to be 5000 but i am getting 5020. When i use proc sort with nodupkey after that, then i get 5000 obs. Why I am not getting the correct answer even after using DISTINCT in proc sql? Can anyone please help? Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;/*Answer should be 5000 obs but below is giving me 5020 obs.*/&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table new as&lt;BR /&gt;select &lt;STRONG&gt;distinct&lt;/STRONG&gt;(demossubmit.custid), *&lt;BR /&gt;from task1.demossubmit join task1.political&lt;BR /&gt;on demossubmit.custid=political.custid&lt;BR /&gt;join task1.response&lt;BR /&gt;on political.custid=response.custid&lt;BR /&gt;join task1.gadgets&lt;BR /&gt;on response.custid=gadgets.custid&lt;BR /&gt;join task1.financial&lt;BR /&gt;on gadgets.custid=financial.custid&lt;BR /&gt;join task1.pets&lt;BR /&gt;on financial.custid=pets.custid;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;/*Below is giving me 5000 obs. but i dont want to this extra step*/&lt;/P&gt;&lt;P&gt;proc sort data=new out=new1 &lt;STRONG&gt;nodupkey&lt;/STRONG&gt;;&lt;BR /&gt;by custid;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:18:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592831#M170027</guid>
      <dc:creator>meetagupta</dc:creator>
      <dc:date>2019-09-30T21:18:29Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592832#M170028</link>
      <description>&lt;P&gt;Try dropping the parentheses&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;distinct &lt;/STRONG&gt;demossubmit.custid, ...&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:21:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592832#M170028</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-09-30T21:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592834#M170029</link>
      <description>It usually means you have mulitple CUSTID in one of your 6 data sets. &lt;BR /&gt;Usually you want to remove one from the data set in question so you should first find out which data set has duplicates and then figure out which record should be used. &lt;BR /&gt;&lt;BR /&gt;You can find which tables have duplicates by checking the counts vs count distinct. &lt;BR /&gt;&lt;BR /&gt;select count(*) as N, count(distinct custid) as N_Distinct&lt;BR /&gt;from task1.demossubmit;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:34:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592834#M170029</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-09-30T21:34:31Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592837#M170031</link>
      <description>Thanks for replying. All the 6 datasets have custid in common. The dataset&lt;BR /&gt;demossubmit have duplicate custid.&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:43:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592837#M170031</guid>
      <dc:creator>meetagupta</dc:creator>
      <dc:date>2019-09-30T21:43:32Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592838#M170032</link>
      <description>Well, when you have duplicates it brings in all the duplicates. Perhaps you need to add another condition when joining that table? Or is there a specific record that makes sense to bring in?</description>
      <pubDate>Mon, 30 Sep 2019 21:44:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592838#M170032</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-09-30T21:44:55Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592839#M170033</link>
      <description>tried doing that but still same answer...&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:45:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592839#M170033</guid>
      <dc:creator>meetagupta</dc:creator>
      <dc:date>2019-09-30T21:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592840#M170034</link>
      <description>What other conditions can I apply while joining? I just dont need any&lt;BR /&gt;duplicate observations.&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Sep 2019 21:49:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592840#M170034</guid>
      <dc:creator>meetagupta</dc:creator>
      <dc:date>2019-09-30T21:49:32Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592846#M170037</link>
      <description>WHY DO YOU HAVE DUPLICATES in the first place? &lt;BR /&gt;&lt;BR /&gt;You need to figure out which record is the correct one to join on. It's very rare that it won't matter. For example, if a person tried to fill out a survey twice, we default to the last set of values. You first have to understand why you have duplicates, how the duplicates can be uniquely identified and then you'll know how to filter them. It's a subject matter problem, not a technical problem. &lt;BR /&gt;&lt;BR /&gt;If you're 100% sure it doesn't matter, run PROC SORT on the table with duplicates prior to the join and remove duplicates and then join it. You can create a different output data set when using PROC SORT and I highly recommend you do that.</description>
      <pubDate>Mon, 30 Sep 2019 22:11:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592846#M170037</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-09-30T22:11:03Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592901#M170066</link>
      <description>&lt;P&gt;There is no equivalent to&amp;nbsp;&lt;STRONG&gt;nodupkey&lt;/STRONG&gt; in SQL&lt;STRONG&gt;,&amp;nbsp;&lt;/STRONG&gt;where we pick any one observation for a key value.&lt;/P&gt;
&lt;P&gt;If you cannot avoid the duplicate in the source table, this is a better way to deal with them:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table new as
select distinct(demossubmit.custid), *
from task1.demossubmit join task1.political
on demossubmit.custid=political.custid
join task1.response
on political.custid=response.custid
join task1.gadgets
on response.custid=gadgets.custid
join task1.financial
on gadgets.custid=financial.custid
join task1.pets
on financial.custid=pets.custid
&lt;STRONG&gt;order by CUSTID;
&lt;/STRONG&gt;quit;

data NEW1; 
  set NEW;
  by CUSTID;
  if first.CUSTID;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 03:34:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592901#M170066</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-10-01T03:34:49Z</dc:date>
    </item>
    <item>
      <title>Re: Join 6 datasets using Proc SQL on custid and dont want any duplicate values.</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592905#M170068</link>
      <description>&lt;P&gt;The distinct predicate requires rows that are completely distinct, not just distinct custid. So you must figure out in what way those duplicate custid rows differ. Suppose duplicate custid rows have different timestamps. That makes them distinct rows. To keep only the row with the latest timestamp, you could add to your query:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;group by custid&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;having timestamp = max(timestamp);&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 03:39:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-6-datasets-using-Proc-SQL-on-custid-and-dont-want-any/m-p/592905#M170068</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-10-01T03:39:00Z</dc:date>
    </item>
  </channel>
</rss>

