<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Distinct on Select Columns in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174690#M3488</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;EM&gt;Would dupout and nodupkey not suffice?&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, of course, but with doing it "manually" you can have additional control which records are preferred to keep (eg first or last in time etc). One can use additional vars in the sort to set the preference.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;Spock&amp;gt; IMO it is easier to convey the developer's intent with the data step method. IMO. &amp;lt;/Spock&amp;gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 28 May 2014 11:41:18 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2014-05-28T11:41:18Z</dc:date>
    <item>
      <title>Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174686#M3484</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;DIS 4.4&lt;/P&gt;&lt;P&gt;SAS 9.3&lt;/P&gt;&lt;P&gt;New Developer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, I've got a dataset and I want to 'select distinct' only on a subset of variables. If the observation is distinct, then I want all of it's variables to pass through. For example, I have 30 variables, but I only want to check 5 of them and if those 5 are the same between two (or more) observations, than I only want one of the observations to go through, but the values for all 30 variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would guess I would use an Extract transformation with the option of select distinct set to 'Yes'. However, this isn't doing it. When I only map the 5 variables to the outgoing dataset, then it works, but I only have 5 variables per observation and not the 30 I desire.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did I make any sense? Basically:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I have Var_1, Var_2, Var_3, ...Var_30, but want to only select on Var_3, Var_5, Var_6, Var_10, and Var_22. I think doing a 'group by' could also do the trick, but also doesn't seem to:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Select&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_1,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_2,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_3,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ...&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_30&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;from TABLE&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;group by&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_3,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_5,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_6,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_10,&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Var_22&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I still end up with the same number of observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, is there a way to output a dataset (even a temporary one) of all observations that were not distinct so that I can view that group as well? Say if I have 200 observations, but after selecting distinct, I have 175...is there a node/transformation I can use to view those 25 that were 'rejected'?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 May 2014 19:54:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174686#M3484</guid>
      <dc:creator>jwhite</dc:creator>
      <dc:date>2014-05-27T19:54:35Z</dc:date>
    </item>
    <item>
      <title>Re: Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174687#M3485</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, jwhite&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"Select distinct" doesn't do what you need; what it does is resolve the SQL request to the result, and then eliminate duplicates from that result, returning one row for every distinct combination of variables in the result. I'm not the world's biggest SQL expert, I suspect somebody else will come to your rescue with the correct code, but at least you know the reason it didn't work!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 May 2014 21:34:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174687#M3485</guid>
      <dc:creator>TomKari</dc:creator>
      <dc:date>2014-05-27T21:34:21Z</dc:date>
    </item>
    <item>
      <title>Re: Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174688#M3486</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You have more control with the data step.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's assume you have var1 to var5, and var1 to var3 are your "criterion variables".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=have;&lt;/P&gt;&lt;P&gt;by var1 var2 var3;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data&lt;/P&gt;&lt;P&gt;&amp;nbsp; want&lt;/P&gt;&lt;P&gt;&amp;nbsp; rejected&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;set have;&lt;/P&gt;&lt;P&gt;by var1 var2 var3;&lt;/P&gt;&lt;P&gt;if first.var3&lt;/P&gt;&lt;P&gt;then output want;&lt;/P&gt;&lt;P&gt;else output rejected; *these are the duplicates;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 May 2014 07:21:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174688#M3486</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2014-05-28T07:21:08Z</dc:date>
    </item>
    <item>
      <title>Re: Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174689#M3487</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Would dupout and nodupkey not suffice?:&lt;/P&gt;&lt;P&gt;data have;&lt;BR /&gt;&amp;nbsp; attrib var1-var5 format=$20.;&lt;BR /&gt;&amp;nbsp; infile cards;&lt;BR /&gt;&amp;nbsp; input var1-var5;&lt;BR /&gt;cards;&lt;BR /&gt;aaa bbb ccc ddd eee fff&lt;BR /&gt;aaa bbb ccc ddd eee fff&lt;BR /&gt;bbb bbb ccc ddd eee fff&lt;BR /&gt;ccc bbb ccc ddd eee fff&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc sort data=have out=want dupout=rejected nodupkey;&lt;BR /&gt;&amp;nbsp; by var1 var2 var3;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;---&lt;/P&gt;&lt;P&gt;To note, you can also do it in SQL, unlike Kurt's solution you would have two steps, one for WANT, one for REJECTED:&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table WANT as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select&amp;nbsp; distinct(VAR1||VAR2||VAR3),&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&lt;/P&gt;&lt;P&gt;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp; HAVE;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table REJECTED as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select&amp;nbsp; distinct(VAR1||VAR2||VAR3),&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&lt;/P&gt;&lt;P&gt;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp; (select *,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; COUNT(VAR1||VAR2||VAR3) as CNT&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from&amp;nbsp;&amp;nbsp; HAVE&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by VAR1||VAR2||VAR3)&lt;/P&gt;&lt;P&gt;&amp;nbsp; where&amp;nbsp;&amp;nbsp; CNT &amp;gt; 1;&lt;/P&gt;&lt;P&gt;quit; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 May 2014 08:32:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174689#M3487</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-05-28T08:32:38Z</dc:date>
    </item>
    <item>
      <title>Re: Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174690#M3488</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;EM&gt;Would dupout and nodupkey not suffice?&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, of course, but with doing it "manually" you can have additional control which records are preferred to keep (eg first or last in time etc). One can use additional vars in the sort to set the preference.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;Spock&amp;gt; IMO it is easier to convey the developer's intent with the data step method. IMO. &amp;lt;/Spock&amp;gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 May 2014 11:41:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174690#M3488</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2014-05-28T11:41:18Z</dc:date>
    </item>
    <item>
      <title>Re: Distinct on Select Columns</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174691#M3489</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If I understand what you mean . you can do something like :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;select distinct catx(' ',Var_3, Var_5, Var_6, Var_10,&amp;nbsp; Var_22)&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 May 2014 11:57:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Distinct-on-Select-Columns/m-p/174691#M3489</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2014-05-28T11:57:49Z</dc:date>
    </item>
  </channel>
</rss>

