<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: reduce output data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610566#M177813</link>
    <description>&lt;P&gt;As you've discovered, you can't put a parameter in to restrict the count of observations in DUPS10 (or generally for any output dataset).&amp;nbsp; Of course you could run a second data step, as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/83078"&gt;@SuryaKiran&lt;/a&gt; demonstrated.&amp;nbsp; That technique will give the first SORTED 10.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is a way to do the complete task in a single data step, with the usage of two hash objects: one (named SRTED below) will yield&amp;nbsp; sorted have, with no duplicate keys, and the second (NAM) will provide the exact count of duplicates you want:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.class;
  ran=ranuni(012498105);
  output;
  ran=ranuni(049810444);
  output;
  ran=ranuni(0259866);
  output;
run;
proc sort data=have out=have (drop=ran);
  by ran;
run;

data duplicates (drop=_:);
  set have;
  if _n_=1 then do;
    declare hash srted(dataset:'have',ordered:'A');
      srted.definekey('name');
      srted.definedata(all:'Y');
      srted.definedone();
      srted.output(dataset:'have_sorted');
     declare hash nam();
       nam.definekey('name');
       nam.definedone();
  end;
  if nam.find()^=0 then nam.add();
  else do;
    output;
    _ndupes+1;
    if _ndupes&amp;gt;=10 then stop;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The resulting data set HAVE_SORTED will have only one record per name - because that is a default property of sas hash objects.&amp;nbsp; It will have the record containing first instance of each sort key - exactly as the NODUPKEY option in PROC SORT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The NAM hash is used to trace whether an incoming record (from the original HAVE dataset) has been encountered before.&amp;nbsp; If it has, then is it a duplicate - to be output to DUPLICATES and counted in _NDUPES.&amp;nbsp; Note these duplicates will not be in sorted order, since they are based on processing the original unsorted HAVE.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 09 Dec 2019 22:23:00 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2019-12-09T22:23:00Z</dc:date>
    <item>
      <title>reduce output data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610507#M177790</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to remove duplicates and save only first 10 duplicates to a new dataset. is that possible?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have nodupkey out=nodup dupout= dups10(outobs=10 ) ;
by id1 id2;
run;
*outobs=10 option does not work;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2019 16:41:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610507#M177790</guid>
      <dc:creator>GeorgeSAS</dc:creator>
      <dc:date>2019-12-09T16:41:23Z</dc:date>
    </item>
    <item>
      <title>Re: reduce output data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610508#M177791</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10531"&gt;@GeorgeSAS&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can do it using PROC SQL; the where statement will have whatever criteria you want applied.&amp;nbsp; You can do it through the DATA step, I'm just more comfortable using SQL.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql outobs=10;
select distinct table_name.variable_name
from table_name
where ….;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2019 16:50:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610508#M177791</guid>
      <dc:creator>DarthPathos</dc:creator>
      <dc:date>2019-12-09T16:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: reduce output data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610509#M177792</link>
      <description>&lt;P&gt;Sounds like you want to do this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=step1;
  by id1 id2;
run;
data nodup dups10;
   set step1;
   by id1 id2;
   if first.id2 then output nodup;
   else if _ndups &amp;lt; 10 then do;
      _ndups+1;
      output dups10;
   end;
  drop _ndups;
run;
proc delete data=step1; 
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 09 Dec 2019 18:17:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610509#M177792</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-09T18:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: reduce output data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610553#M177810</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can use OBS= on the output table being created,&amp;nbsp;&lt;SPAN&gt;OBS= is valid only when an existing SAS data set is read. OUTOBS is only used in PROC SQL.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This might work.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=sashelp.class out=nodups dupout=dups nodupkey;
by sex;
run;

data dups;
set dups(obs=10);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2019 20:30:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610553#M177810</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2019-12-09T20:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: reduce output data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610566#M177813</link>
      <description>&lt;P&gt;As you've discovered, you can't put a parameter in to restrict the count of observations in DUPS10 (or generally for any output dataset).&amp;nbsp; Of course you could run a second data step, as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/83078"&gt;@SuryaKiran&lt;/a&gt; demonstrated.&amp;nbsp; That technique will give the first SORTED 10.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is a way to do the complete task in a single data step, with the usage of two hash objects: one (named SRTED below) will yield&amp;nbsp; sorted have, with no duplicate keys, and the second (NAM) will provide the exact count of duplicates you want:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  set sashelp.class;
  ran=ranuni(012498105);
  output;
  ran=ranuni(049810444);
  output;
  ran=ranuni(0259866);
  output;
run;
proc sort data=have out=have (drop=ran);
  by ran;
run;

data duplicates (drop=_:);
  set have;
  if _n_=1 then do;
    declare hash srted(dataset:'have',ordered:'A');
      srted.definekey('name');
      srted.definedata(all:'Y');
      srted.definedone();
      srted.output(dataset:'have_sorted');
     declare hash nam();
       nam.definekey('name');
       nam.definedone();
  end;
  if nam.find()^=0 then nam.add();
  else do;
    output;
    _ndupes+1;
    if _ndupes&amp;gt;=10 then stop;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The resulting data set HAVE_SORTED will have only one record per name - because that is a default property of sas hash objects.&amp;nbsp; It will have the record containing first instance of each sort key - exactly as the NODUPKEY option in PROC SORT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The NAM hash is used to trace whether an incoming record (from the original HAVE dataset) has been encountered before.&amp;nbsp; If it has, then is it a duplicate - to be output to DUPLICATES and counted in _NDUPES.&amp;nbsp; Note these duplicates will not be in sorted order, since they are based on processing the original unsorted HAVE.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2019 22:23:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reduce-output-data/m-p/610566#M177813</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-12-09T22:23:00Z</dc:date>
    </item>
  </channel>
</rss>

