<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: sort and take 1st row for each ID in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676486#M203988</link>
    <description>&lt;P&gt;This code could replace PROC SORT to save time.&lt;/P&gt;
&lt;P&gt;Make a macro and call execute() to go through all the id levels to get then final result by combining these sub-datasets .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data F M;
 set sashelp.class;
 select(sex);
 when ('F')  output F;
 when ('M')  output M;
 else;
 end;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 13 Aug 2020 13:40:16 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2020-08-13T13:40:16Z</dc:date>
    <item>
      <title>sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676141#M203826</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;I want to ask a question about sort a very big data set and take first raw for each ID.&lt;/P&gt;
&lt;P&gt;Let's say that the raw data (Called: ttt) has 3 columns: ID, bank, amount.&lt;/P&gt;
&lt;P&gt;I want to sort by ID,amount and take for each ID the raw with highest amount.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;The problem that this query takes very long time.&lt;/P&gt;
&lt;P&gt;Is there a better way to do it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=ttt;
by ID descending amount;
Run;
Data wanted;
set ttt;
by ID ;
if first.ID then output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 13:04:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676141#M203826</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2020-08-12T13:04:49Z</dc:date>
    </item>
    <item>
      <title>Re: sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676144#M203828</link>
      <description>&lt;P&gt;I assume that it is the Sort process that takes time?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is it a requirement to sort the data?&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 13:16:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676144#M203828</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-08-12T13:16:08Z</dc:date>
    </item>
    <item>
      <title>Re: sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676154#M203833</link>
      <description>&lt;P&gt;If the data is already sorted by ID (but not descending amount), you can skip the sort like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set have;
  by id;
  if first.id then _max=amount;
  else _max= max(_max,amount);
  if amount=_max then _bank=bank;
  if last.id;
  bank=_bank;
  amount=_max;
  retain _:;
  drop _:;
run;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Else you can use a hash table in order to create the table you want without sorting first:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  set have end=done;
  _Bank=Bank; 
  _amount=amount;
  if _N_=1 then do;
    declare hash h(ordered: 'Y');
    h.definekey('ID');
    h.definedata('ID','Bank','Amount');
    h.definedone();
    end;
  if h.find() then
    h.add();
  else if _amount&amp;gt;amount then do;
    Bank=_Bank;
    Amount=_Amount;
    h.replace();
    end;
  if done;
  h.output(dataset: 'Want');
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 13:46:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676154#M203833</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2020-08-12T13:46:13Z</dc:date>
    </item>
    <item>
      <title>Re: sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676176#M203844</link>
      <description>&lt;P&gt;Yes, sort process last forever and never finish.&lt;/P&gt;
&lt;P&gt;The data set includes 25 million rows.&lt;/P&gt;
&lt;P&gt;Please note that there is no key to the data set so cannot define index.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 14:19:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676176#M203844</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2020-08-12T14:19:19Z</dc:date>
    </item>
    <item>
      <title>Re: sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676201#M203855</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Yes, sort process last forever and never finish.&lt;/P&gt;
&lt;P&gt;The data set includes 25 million rows.&lt;/P&gt;
&lt;P&gt;Please note that there is no key to the data set so cannot define index.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Does your data set contain 100's or 1000's of other variables? If so try the TAGSORT option. That can reduce the size of temporary data sets used in sorting and reduce disk IO common with the sort process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;25 million is not really that large of data set and if it taking a long time and "never" finishes - just how long did you wait? then you may be sorting in an external database? (Use that DB sort tools or pass through code) . I just created a data set with 25 million records, random values which may take longer to sort than grouping variables like IDs, and that took about 9.7 minutes.&lt;/P&gt;
&lt;P&gt;or you are working on a really crappy network connection or similar.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Aug 2020 15:24:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676201#M203855</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-08-12T15:24:09Z</dc:date>
    </item>
    <item>
      <title>Re: sort and take 1st row for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676486#M203988</link>
      <description>&lt;P&gt;This code could replace PROC SORT to save time.&lt;/P&gt;
&lt;P&gt;Make a macro and call execute() to go through all the id levels to get then final result by combining these sub-datasets .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data F M;
 set sashelp.class;
 select(sex);
 when ('F')  output F;
 when ('M')  output M;
 else;
 end;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Aug 2020 13:40:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/sort-and-take-1st-row-for-each-ID/m-p/676486#M203988</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-08-13T13:40:16Z</dc:date>
    </item>
  </channel>
</rss>

