<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic proc sql and data step for testing duplications are giving differet numbers in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563570#M158006</link>
    <description>&lt;P&gt;Hi everybody, I'm trying to find the duplications in a data set using two different codes. But I&amp;nbsp;get different #observations! Would somebody explain it to me why?!&amp;nbsp;here&amp;nbsp;are the codes:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql; create table&amp;nbsp;SINGLE &amp;nbsp;as select unique&amp;nbsp;VARIABLE from&amp;nbsp;&amp;nbsp;DATASET; quit; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sort data=DATASET; by VARIABLE; run;&lt;/P&gt;&lt;P&gt;data dup nodup blank;&lt;/P&gt;&lt;P&gt;set&amp;nbsp;DATASET;&lt;/P&gt;&lt;P&gt;by VARIABLE;&lt;/P&gt;&lt;P&gt;if first.VARIABLE and last.VARIABLE then output nodup;&lt;/P&gt;&lt;P&gt;else if VARIABLE="" then output blank;&lt;/P&gt;&lt;P&gt;else output dup;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but I get different # of observations for table SINGLE and&amp;nbsp;data dup.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Jun 2019 17:33:33 GMT</pubDate>
    <dc:creator>mamin088</dc:creator>
    <dc:date>2019-06-04T17:33:33Z</dc:date>
    <item>
      <title>proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563570#M158006</link>
      <description>&lt;P&gt;Hi everybody, I'm trying to find the duplications in a data set using two different codes. But I&amp;nbsp;get different #observations! Would somebody explain it to me why?!&amp;nbsp;here&amp;nbsp;are the codes:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql; create table&amp;nbsp;SINGLE &amp;nbsp;as select unique&amp;nbsp;VARIABLE from&amp;nbsp;&amp;nbsp;DATASET; quit; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sort data=DATASET; by VARIABLE; run;&lt;/P&gt;&lt;P&gt;data dup nodup blank;&lt;/P&gt;&lt;P&gt;set&amp;nbsp;DATASET;&lt;/P&gt;&lt;P&gt;by VARIABLE;&lt;/P&gt;&lt;P&gt;if first.VARIABLE and last.VARIABLE then output nodup;&lt;/P&gt;&lt;P&gt;else if VARIABLE="" then output blank;&lt;/P&gt;&lt;P&gt;else output dup;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but I get different # of observations for table SINGLE and&amp;nbsp;data dup.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 17:33:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563570#M158006</guid>
      <dc:creator>mamin088</dc:creator>
      <dc:date>2019-06-04T17:33:33Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563571#M158007</link>
      <description>&lt;P&gt;Your two codes are doing totally different things.&lt;/P&gt;
&lt;P&gt;To replicate what your SQL code is doing just use FIRST.VARIABLE.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dup nodup blank single2(keep=variable);
  set DATASET;
  by VARIABLE;

  if first.VARIABLE then output single2;

  if first.VARIABLE and last.VARIABLE then output nodup;
  else if VARIABLE="" then output blank;
  else output dup;

run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 04 Jun 2019 17:40:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563571#M158007</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-06-04T17:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563572#M158008</link>
      <description>&lt;P&gt;that is because this code&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if first.VARIABLE and last.VARIABLE then output nodup;
else if VARIABLE="" then output blank;
else output dup;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;will not be the same as this code&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;unique VARIABLE&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;why are you using unique rather that distinct ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 17:41:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563572#M158008</guid>
      <dc:creator>VDD</dc:creator>
      <dc:date>2019-06-04T17:41:08Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563576#M158012</link>
      <description>Thank you! I did that and you are right the&lt;BR /&gt;if first.VARIABLE then output single2; replicates the SQL code but why? I don't understand the difference? SQL is supposed to give me the distinct observations.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 04 Jun 2019 17:51:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563576#M158012</guid>
      <dc:creator>mamin088</dc:creator>
      <dc:date>2019-06-04T17:51:58Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563577#M158013</link>
      <description>Thank you VDD! I just searched and found that sql unique is an old syntax and is not an standard one anymore. Better to use the distinct. Thank you again!</description>
      <pubDate>Tue, 04 Jun 2019 17:52:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563577#M158013</guid>
      <dc:creator>mamin088</dc:creator>
      <dc:date>2019-06-04T17:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563582#M158018</link>
      <description>&lt;P&gt;The DISTINCT is giving you distinct OBSERVATIONS.&amp;nbsp; But since your select statement only has one variable the effect is distinct VALUES of that variable. No matter how many times that value appears the output includes it only once.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your data step FIRST.VARIABLE and LAST.VARIABLE will be true for values of VARIABLE that appear only ONCE in the data.&amp;nbsp; The values that appear multiple times will be written to one of the other datasets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Consider this example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input variable ;
cards;
1
1
2
3
3
4
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The distinct set of values is 1,2,3,4.&amp;nbsp; The values that appear only once are 2 and 4.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 18:11:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563582#M158018</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-06-04T18:11:50Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql and data step for testing duplications are giving differet numbers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563583#M158019</link>
      <description>Got it! Thank you so much for your time!</description>
      <pubDate>Tue, 04 Jun 2019 18:16:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-and-data-step-for-testing-duplications-are-giving/m-p/563583#M158019</guid>
      <dc:creator>mamin088</dc:creator>
      <dc:date>2019-06-04T18:16:52Z</dc:date>
    </item>
  </channel>
</rss>

