<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to remove duplicates in a portion of data set? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388763#M65962</link>
    <description>&lt;P&gt;try TAGSORT option of proc sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=month_sorted nodupkey tagsort sortsize=max;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 17 Aug 2017 12:52:37 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2017-08-17T12:52:37Z</dc:date>
    <item>
      <title>How to remove duplicates in a portion of data set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388691#M65958</link>
      <description>Hi All, I have a data set with a size of 200 GB and I want ot remove duplicates in a particular month. If I execute nodupkey on the full data set the utility space becoming full and process being failed. Is there an option to remove the duplicates in a portion of data set. I tried to subset the data set first for the specific month and remove the duplicates. Later I appended back to the orginal data set. However again I have to sort the original data set which will utilize the more space. Kindly let me know if we have any option to remove the duplicates of a portion of data set. Thanks in advance !.</description>
      <pubDate>Thu, 17 Aug 2017 05:13:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388691#M65958</guid>
      <dc:creator>LOVE_SAA</dc:creator>
      <dc:date>2017-08-17T05:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates in a portion of data set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388694#M65959</link>
      <description>&lt;P&gt;Sort needs approximately 2.5 disk space relating to original dataset disk space.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is the data already sorted by any key plus or including month ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If positive you can do:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have(where=(month=&amp;lt;desired&amp;gt;))
              out=month_sorted nodupkey;
  by &amp;lt;key variables&amp;gt;;
run;

data new;
     set have(where=(moth &amp;lt; &amp;lt;desired&amp;gt;))
           month_sorted
          have(where=(month &amp;gt; &amp;lt;desired&amp;gt;))
 ;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 17 Aug 2017 05:25:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388694#M65959</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-08-17T05:25:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates in a portion of data set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388698#M65960</link>
      <description>Thanks Shmuel!. Yes the data set is already sorted by key plus including month.</description>
      <pubDate>Thu, 17 Aug 2017 05:47:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388698#M65960</guid>
      <dc:creator>LOVE_SAA</dc:creator>
      <dc:date>2017-08-17T05:47:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates in a portion of data set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388763#M65962</link>
      <description>&lt;P&gt;try TAGSORT option of proc sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=month_sorted nodupkey tagsort sortsize=max;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 17 Aug 2017 12:52:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/388763#M65962</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-08-17T12:52:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove duplicates in a portion of data set?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/389035#M65995</link>
      <description>&lt;P&gt;Hi Ksharp,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As Shmuel quoted &lt;U&gt;&lt;STRONG&gt;"Sort needs approximately 2.5 disk space relating to original dataset disk space."&lt;/STRONG&gt;&lt;/U&gt;. So in my case I tried with shmuel suggesition and CPU, I/O statistics looks good.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to highlight one more point on the data set which I worked is, its a size of approxmately 1 TB since it was compressed its of 200 GB. So&amp;nbsp;I obersved that working on segments of huge data set is looks fine.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your suggesition!&lt;/P&gt;</description>
      <pubDate>Fri, 18 Aug 2017 04:25:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-remove-duplicates-in-a-portion-of-data-set/m-p/389035#M65995</guid>
      <dc:creator>LOVE_SAA</dc:creator>
      <dc:date>2017-08-18T04:25:27Z</dc:date>
    </item>
  </channel>
</rss>

