<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient way to identify unique records in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463666#M118152</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;already answered your question, but there are still other ways of expressing the same thing. Other than using sort, they all run as efficiently as the others.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned, proc sort can also be used but, for what you want, in a slightly different manner than&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.class;
  do i=1 to 100000;
    if i eq 1 then id=_n_*100000;
    else id=_n_*100000+1;
    output;
  end;
run;
data want;
  set have;
  by id;
  if first.id and last.id;
run;
data want;
  set have;
  by id;
  if first.id eq 1 and last.id eq 1;
run;

data want;
  set have;
  by id;
  if first.id +last.id eq 2;
run;

data want;
  set have;
  by id;
  if first.id *last.id;
run;

data want;
  set have;
  by id;
  if first.id *last.id eq 1;
run;

proc sort data=have out=dontwant nouniquekey noequals UNIQUEOUT=want;
  by id;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And, as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned, if the file has to be sorted first, it may actually end up being the fastest method.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 20 May 2018 21:24:58 GMT</pubDate>
    <dc:creator>art297</dc:creator>
    <dc:date>2018-05-20T21:24:58Z</dc:date>
    <item>
      <title>Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463646#M118149</link>
      <description>&lt;P&gt;The unique records can be identified with the any of the following&lt;/P&gt;
&lt;P&gt;Approach 1&lt;/P&gt;
&lt;P&gt;if first.var and last.var&lt;/P&gt;
&lt;P&gt;Approach 2&lt;/P&gt;
&lt;P&gt;if first.var=1 and last.var=1&lt;/P&gt;
&lt;P&gt;Is there any difference between them in terms of efficiency and time?&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 19:05:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463646#M118149</guid>
      <dc:creator>thesasuser</dc:creator>
      <dc:date>2018-05-20T19:05:17Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463653#M118150</link>
      <description>&lt;P&gt;in my opinion both does the same evaluation so no.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 19:52:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463653#M118150</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-05-20T19:52:01Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463665#M118151</link>
      <description>&lt;P&gt;Try PROC SORT with the NODUPKEY option as well, since you often have to sort before using FIRST/LAST you can sometimes avoid extra steps which is more efficient.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/39715"&gt;@thesasuser&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;The unique records can be identified with the any of the following&lt;/P&gt;
&lt;P&gt;Approach 1&lt;/P&gt;
&lt;P&gt;if first.var and last.var&lt;/P&gt;
&lt;P&gt;Approach 2&lt;/P&gt;
&lt;P&gt;if first.var=1 and last.var=1&lt;/P&gt;
&lt;P&gt;Is there any difference between them in terms of efficiency and time?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 21:18:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463665#M118151</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-05-20T21:18:21Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463666#M118152</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;already answered your question, but there are still other ways of expressing the same thing. Other than using sort, they all run as efficiently as the others.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned, proc sort can also be used but, for what you want, in a slightly different manner than&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.class;
  do i=1 to 100000;
    if i eq 1 then id=_n_*100000;
    else id=_n_*100000+1;
    output;
  end;
run;
data want;
  set have;
  by id;
  if first.id and last.id;
run;
data want;
  set have;
  by id;
  if first.id eq 1 and last.id eq 1;
run;

data want;
  set have;
  by id;
  if first.id +last.id eq 2;
run;

data want;
  set have;
  by id;
  if first.id *last.id;
run;

data want;
  set have;
  by id;
  if first.id *last.id eq 1;
run;

proc sort data=have out=dontwant nouniquekey noequals UNIQUEOUT=want;
  by id;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And, as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;mentioned, if the file has to be sorted first, it may actually end up being the fastest method.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 21:24:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463666#M118152</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-20T21:24:58Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463667#M118153</link>
      <description>&lt;P&gt;Thanks.&lt;/P&gt;
&lt;P&gt;This was an interview question. The interviewer informed that one of them is more efficient.&lt;/P&gt;
&lt;P&gt;To me both looked identically efficient.&amp;nbsp;&lt;BR /&gt;Wanted to know form the community if I am correct&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 21:23:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463667#M118153</guid>
      <dc:creator>thesasuser</dc:creator>
      <dc:date>2018-05-20T21:23:22Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463668#M118154</link>
      <description>&lt;P&gt;Well well well, does the interviewer think processing 2 more tokens makes it less efficient? Jeez!!!&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 21:25:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463668#M118154</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-05-20T21:25:21Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463673#M118155</link>
      <description>&lt;P&gt;I’m not fond of this, but sometimes people will ask obviously wrong questions to see how a candidate deals with this. People who get super defensive about a mistake or flustered are what they’re looking for in this case.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 22:19:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463673#M118155</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-05-20T22:19:00Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient way to identify unique records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463674#M118156</link>
      <description>&lt;P&gt;Interesting footnote to this question. Did you know that you can specify _null_ as an output file to proc sort? Before today, I didn't!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, in trying to test the idea that the nouniquekey and uniqueout options might be faster than doing a sort and then using first. and last. in a datastep, I was bothered that I (thought I) had to create a dummy output file. Not so!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=_null_ nouniquekey noequals UNIQUEOUT=want;
  by id;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;On a 1.9 million record file, with 19 unique ids, the above only took 0.49 seconds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The sort alone of the initial file (needed, of course in order to use first. and last. processing, took 0.68 seconds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Not that I've ever needed a file of records that don't have duplicates, but nice to know which would be the most efficient process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 May 2018 22:19:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-way-to-identify-unique-records/m-p/463674#M118156</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2018-05-20T22:19:35Z</dc:date>
    </item>
  </channel>
</rss>

