<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I select distinct on multiple variables using PROC SQL in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553800#M154024</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271810"&gt;@Paige1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N" 
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run this code, it underlines ID with a 22 and puts:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ERROR 22-322: Syntax error, expecting one of the following: !, !!, &amp;amp;, (, *, **, +, ',', -, '.', /, &amp;lt;, &amp;lt;=, &amp;lt;&amp;gt;, =, &amp;gt;, &amp;gt;=, ?, AND, AS,&lt;BR /&gt;CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS 9.4&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you very much&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Distinct to use for the purpose you intend has to be before the variables and applies to all variables on the SELECT clause. Since you have distinct following RSP then the compiler thinks that distinct is a variable you are selecting and is missing a comma or some operator between distinct and id.&lt;/P&gt;
&lt;P&gt;Likely you would get what you intend with&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Select distinct rsp, id, par&lt;/P&gt;</description>
    <pubDate>Wed, 24 Apr 2019 22:09:27 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2019-04-24T22:09:27Z</dc:date>
    <item>
      <title>How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553791#M154018</link>
      <description>&lt;P&gt;I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N" 
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run this code, it underlines ID with a 22 and puts:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ERROR 22-322: Syntax error, expecting one of the following: !, !!, &amp;amp;, (, *, **, +, ',', -, '.', /, &amp;lt;, &amp;lt;=, &amp;lt;&amp;gt;, =, &amp;gt;, &amp;gt;=, ?, AND, AS,&lt;BR /&gt;CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS 9.4&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much&lt;/P&gt;</description>
      <pubDate>Wed, 24 Apr 2019 20:55:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553791#M154018</guid>
      <dc:creator>Paige1</dc:creator>
      <dc:date>2019-04-24T20:55:29Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553792#M154019</link>
      <description>&lt;P&gt;Why not simply:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=oldtable.fy18(keep=RSP ID PAR) nodupkey out=newtable;
   by ID PAR;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Apr 2019 21:04:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553792#M154019</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-04-24T21:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553794#M154020</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31304"&gt;@PeterClemmensen&lt;/a&gt;&amp;nbsp;The oldtable is VERY large and would take a lot of work to sort.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Apr 2019 21:11:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553794#M154020</guid>
      <dc:creator>Paige1</dc:creator>
      <dc:date>2019-04-24T21:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553800#M154024</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271810"&gt;@Paige1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I am trying to create a dataset based on a much bigger dataset that has many more rows and columns. I only want to keep the columns RSP, ID, and PAR. And I only want the rows that have unique/distinct ID and PAR combinations. But the code I have keeps returning errors:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table newtable as
select RSP, distinct ID, PAR
from oldtable.fy18
where RSP="N" 
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run this code, it underlines ID with a 22 and puts:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ERROR 22-322: Syntax error, expecting one of the following: !, !!, &amp;amp;, (, *, **, +, ',', -, '.', /, &amp;lt;, &amp;lt;=, &amp;lt;&amp;gt;, =, &amp;gt;, &amp;gt;=, ?, AND, AS,&lt;BR /&gt;CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS 9.4&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you very much&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Distinct to use for the purpose you intend has to be before the variables and applies to all variables on the SELECT clause. Since you have distinct following RSP then the compiler thinks that distinct is a variable you are selecting and is missing a comma or some operator between distinct and id.&lt;/P&gt;
&lt;P&gt;Likely you would get what you intend with&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Select distinct rsp, id, par&lt;/P&gt;</description>
      <pubDate>Wed, 24 Apr 2019 22:09:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553800#M154024</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-04-24T22:09:27Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553815#M154032</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271810"&gt;@Paige1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31304"&gt;@PeterClemmensen&lt;/a&gt;&amp;nbsp;The oldtable is VERY large and would take a lot of work to sort.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You are going to have to sort to find the distinct values.&amp;nbsp; PROC SORT is probably the fastest way to sort.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Apr 2019 23:37:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553815#M154032</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-04-24T23:37:14Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553899#M154067</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271810"&gt;@Paige1&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could simply&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select distinct ID, PAR&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;because, given your WHERE condition, the RSP values would be redundant anyway.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In a quick test on my computer with 100 million observations (and 4 variables) containing 2 million distinct ID-PAR combinations (for RSP='N') in random order the PROC SQL step took 53 s. PROC SORT with NODUPKEY option was faster (46 s) and a DATA step using a hash object (as shown below) was slower (59 s, without "&lt;FONT face="courier new,courier"&gt;ordered:'y'&lt;/FONT&gt;" 57 s).&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
if 0 then set fy18;
dcl hash h(dataset:'fy18(where=(rsp="N"))', ordered:'y', hashexp:20);
h.definekey('id', 'par');
h.definedone();
h.output(dataset:'newtable');
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 25 Apr 2019 09:12:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/553899#M154067</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2019-04-25T09:12:35Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/554604#M154299</link>
      <description>&lt;P&gt;select distinct ID, Par worked.&lt;/P&gt;&lt;P&gt;I didn't realize that if you have a where statement, it will include that variable in the table you're creating. Thank you all!&lt;/P&gt;</description>
      <pubDate>Sun, 28 Apr 2019 16:58:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/554604#M154299</guid>
      <dc:creator>Paige1</dc:creator>
      <dc:date>2019-04-28T16:58:06Z</dc:date>
    </item>
    <item>
      <title>Re: How do I select distinct on multiple variables using PROC SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/554610#M154303</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/271810"&gt;@Paige1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I didn't realize that if you have a where statement, it will include that variable in the table you're creating.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;No, this is not the case. With "redundant" I meant that all (possibly millions of) observations in dataset NEWTABLE would contain the &lt;EM&gt;same&lt;/EM&gt; value of RSP (namely "N") and you would &lt;EM&gt;know&lt;/EM&gt; this value beforehand because you specified it in the WHERE clause.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Apr 2019 18:10:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-select-distinct-on-multiple-variables-using-PROC-SQL/m-p/554610#M154303</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2019-04-28T18:10:21Z</dc:date>
    </item>
  </channel>
</rss>

