<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: eliminating duplicates in a data step in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405521#M12365</link>
    <description>&lt;P&gt;yeah I was afraid of that , I currently am using proc sql; with a select distinct to filter them out&lt;/P&gt;</description>
    <pubDate>Thu, 19 Oct 2017 13:18:11 GMT</pubDate>
    <dc:creator>robm</dc:creator>
    <dc:date>2017-10-19T13:18:11Z</dc:date>
    <item>
      <title>eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405419#M12358</link>
      <description>&lt;P&gt;I have a data step like this that creates a 4gb file , unfortunately it has duplicates that would reduce it too 150,000kb is there an equivalent of “distinct” for the sas syntax below.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;  data ROBM.LIMITFILE1;
    set _egimle.&amp;amp;timeperiodvalue (keep=&amp;amp;limitvar1field &amp;amp;limitvar1fielddesc &amp;amp;limitvar2field &amp;amp;limitvar2fielddesc &amp;amp;limitvar3field &amp;amp;limitvar3fielddesc &amp;amp;limitvar4field
                                        &amp;amp;limitvar4fielddesc &amp;amp;limitvar5field &amp;amp;limitvar5fielddesc &amp;amp;limitvar6field &amp;amp;limitvar6fielddesc &amp;amp;limitvar7field &amp;amp;limitvar7fielddesc
                                        &amp;amp;limitvar8field &amp;amp;limitvar8fielddesc filter=("&amp;amp;trendvalue" AND "&amp;amp;timeperiodfilter"));&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 06:01:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405419#M12358</guid>
      <dc:creator>robm</dc:creator>
      <dc:date>2017-10-19T06:01:53Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405426#M12359</link>
      <description>&lt;P&gt;Filtering duplicates always involves sorting by the criteria that define the duplicates.&lt;/P&gt;
&lt;P&gt;You can either use proc sort with nodupkey, or proc sql with select distinct.&lt;/P&gt;
&lt;P&gt;proc sort can be more efficient, as select distinct is equivalent to the noduprec option in proc sort.&lt;/P&gt;
&lt;P&gt;If your source dataset is already sorted by the criteria mentioned above, you can use by processing and first. or last. to only accept a single observation per by group.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 07:05:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405426#M12359</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-10-19T07:05:58Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405427#M12360</link>
      <description>&lt;P&gt;BTW there is no filter= dataset option in SAS.&lt;/P&gt;
&lt;P&gt;And compare this code with yours:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ROBM.LIMITFILE1;
set _egimle.&amp;amp;timeperiodvalue (
  keep=
    &amp;amp;limitvar1field &amp;amp;limitvar1fielddesc
    &amp;amp;limitvar2field &amp;amp;limitvar2fielddesc
    &amp;amp;limitvar3field &amp;amp;limitvar3fielddesc
    &amp;amp;limitvar4field &amp;amp;limitvar4fielddesc
    &amp;amp;limitvar5field &amp;amp;limitvar5fielddesc 
    &amp;amp;limitvar6field &amp;amp;limitvar6fielddesc
    &amp;amp;limitvar7field &amp;amp;limitvar7fielddesc
    &amp;amp;limitvar8field &amp;amp;limitvar8fielddesc
  where=(trend="&amp;amp;trendvalue" AND time="&amp;amp;timeperiodfilter")
);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;which is more readable and therefore easier to maintain?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 07:09:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405427#M12360</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-10-19T07:09:05Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405519#M12363</link>
      <description>&lt;P&gt;You are right it does&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanks Kurt&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 13:17:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405519#M12363</guid>
      <dc:creator>robm</dc:creator>
      <dc:date>2017-10-19T13:17:05Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405521#M12365</link>
      <description>&lt;P&gt;yeah I was afraid of that , I currently am using proc sql; with a select distinct to filter them out&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 13:18:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405521#M12365</guid>
      <dc:creator>robm</dc:creator>
      <dc:date>2017-10-19T13:18:11Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405577#M12371</link>
      <description>&lt;P&gt;Why do you have duplicates in the first place? Seems like it's worth backing up a few steps and removing them there?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 14:52:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405577#M12371</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-10-19T14:52:14Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405610#M12374</link>
      <description>&lt;P&gt;Hi Reza&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;the original dataset has unique data by the fact that the measure has a $value for classes a student took , I am trying to make a "LIMIT" dataset in the work. area on the fly for this web based stored process so it needs to just get the filter fields in a unique form. But when I use the data step method i have no way of filtering the duopes unles i use the proc sql; or sort ...does that make sense?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 15:37:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405610#M12374</guid>
      <dc:creator>robm</dc:creator>
      <dc:date>2017-10-19T15:37:52Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405612#M12375</link>
      <description>&lt;P&gt;It depends, what's the LIMIT data set for? If it's to create some LIMITs I assume you'd be taking some stats on the $ value, which would lead to PROC MEANS or such.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But otherwise, that does seem like the only way. PROC SORT and a data step can identify duplicates but other PROCS are easier to type out so that's a fair approach IMO.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 15:41:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405612#M12375</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-10-19T15:41:47Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405743#M12383</link>
      <description>&lt;P&gt;You could consider a hash object&amp;nbsp;to eliminate duplicates in a data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let keep=make model type origin msrp cylinders;
data robm.limitfile1;
  if _N_=1 then do;
    if 0 then set sashelp.cars(keep=&amp;amp;keep);
    declare hash H(
       dataset:"sashelp.cars(keep=&amp;amp;keep where=(cylinders=6))",
       ordered:"ascending");
	H.definekey("make");
	H.definedata("make","model","type","origin","msrp","cylinders");
	H.definedone();
	H.output(dataset:"work.limit");
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The hash object&amp;nbsp;rejects duplicate keys by default. Leave off the &lt;EM&gt;ordered&lt;/EM&gt; parameter if order does not matter in the result.&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2017 21:29:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405743#M12383</guid>
      <dc:creator>RLigtenberg</dc:creator>
      <dc:date>2017-10-19T21:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: eliminating duplicates in a data step</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405838#M12384</link>
      <description>&lt;P&gt;And if you need to do additional processing&amp;nbsp;on each&amp;nbsp;observation (after the "filter")&amp;nbsp;you can use a hiter (hash iterator) object.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let keep=make model type origin msrp cylinders;
data limitfile1;
  length tax 8.;
  if _N_=1 then do;
    if 0 then set sashelp.cars(keep=&amp;amp;keep);
    declare hash H(
       dataset:"sashelp.cars(keep=&amp;amp;keep where=(cylinders=6))");
	H.definekey("make");
	H.definedata("make","model","type","origin","msrp","cylinders");
	H.definedone();
	declare hiter I("H");
  end;
  rc=0;
  do while(rc=0);
    rc=I.next();
	tax=msrp*.35;
	output;
  end;
  stop;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Oct 2017 00:41:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/eliminating-duplicates-in-a-data-step/m-p/405838#M12384</guid>
      <dc:creator>RLigtenberg</dc:creator>
      <dc:date>2017-10-20T00:41:54Z</dc:date>
    </item>
  </channel>
</rss>

