<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc sort large table in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868267#M342979</link>
    <description>but I sort by 13 variables so it's more complex</description>
    <pubDate>Wed, 05 Apr 2023 14:38:45 GMT</pubDate>
    <dc:creator>mariopellegrini</dc:creator>
    <dc:date>2023-04-05T14:38:45Z</dc:date>
    <item>
      <title>proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868155#M342917</link>
      <description>&lt;P&gt;I don't quite understand why a proc sort applied to a large table worsens performance if applied to the same dataset, but compressed.&lt;BR /&gt;The table in question weighs about 29Gb (305819687 observations and 16 variables, of which 13 variables used in the "by"), and by doing the COMPRESS the weight is reduced to only 26Gb. A first doubt I have is that usually compress reduces the occupied memory much more.&lt;BR /&gt;I wish I could optimize this proc sort but I didn't get good results, neither with the use of compress for the table, nor with the Bufsize= and Bufno = options (I tried them for all possible combinations), nor with the TAGSORT option .&lt;BR /&gt;With the SORTSIZE option I think the situation does not change because this is set to 32Gb, therefore a value already higher than the table size.&lt;BR /&gt;Here is the outcome of the log of the starting sort:&lt;/P&gt;
&lt;P&gt;&lt;FONT size="2"&gt;NOTE: There were 305819687 observations read from the data set ODS_VT.T_F_PTFVT_RISERVA_AWARDS.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;NOTE: The data set WORK.ETLS_SORTEDXREF has 305819687 observations and 13 variables.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;NOTES: PROCEDURE SORT used (total processing time):&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;real-time 8:29.88&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;user cpu time 2:35.53&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;system cpu time 53.34 seconds&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;memory 31708967.57k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;OS Memory 31729708.00k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Timestamp 04/05/2023 08:18:18 m.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Step Count 15 Switch Count 659&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;and here the result of the log after the application of the proc sort on the compressed table with the use of the TAGSORT option:&lt;/P&gt;
&lt;P&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;NOTE: Tagsort reads each observation of the input data set twice.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;NOTE: The data set WORK.ETLS_SORTEDXREF has 305819687 observations and 13 variables.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;NOTES: Compressing data set WORK.ETLS_SORTEDXREF decreased size by 7.66 percent.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;Compressed is 311022 pages; uncompressed would require 336806 pages.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;NOTES: PROCEDURE SORT used (total processing time):&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;real time 12:35.03&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;user cpu time 6:48.25&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;system cpu time 1:22.71&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;memory 20403259.93k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;OS Memory 20426884.00k&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;Timestamp 05/04/2023 08:35:33m.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;Step Count 19 Switch Count 1205&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 08:30:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868155#M342917</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T08:30:17Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868156#M342918</link>
      <description>&lt;P&gt;With a compression rate of only 10%, COMPRESS is not really useful, and the overhead of TAGSORT (reading the input twice) is not compensated by having a smaller utility file.&lt;/P&gt;
&lt;P&gt;Sorting 29 G in 8.5 minutes is not that bad, but a real time of 8.5 minutes vs. CPU time of 3.5 minutes points to your WORK storage being the bottleneck.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 08:35:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868156#M342918</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-04-05T08:35:49Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868165#M342921</link>
      <description>&lt;P&gt;Agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;, compression in this scenario will only slow down your process.&lt;/P&gt;
&lt;P&gt;The only quick solution I can think of if there's more available memory, raise MEMESIZE and SORTSIZE.&lt;/P&gt;
&lt;P&gt;I guess this is ajob that will run regularly. There might be a way yo could redesign your ETL not have to resort the whole table each time - but that needs in depth knowledge of your business rules, source system contents etc.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 09:40:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868165#M342921</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2023-04-05T09:40:53Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868166#M342922</link>
      <description>&lt;P&gt;I agree with Kurt's "&lt;SPAN&gt;Sorting 29 G in 8.5 minutes is not that bad".&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you want faster ,here is an example but need more code .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Female Male;
 set sashelp.heart;
 if sex='Female' then output Female;
  else if sex='Male' then output Male;
run;

data want;
 set Female Male;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 05 Apr 2023 09:44:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868166#M342922</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2023-04-05T09:44:32Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868167#M342923</link>
      <description>&lt;P&gt;&lt;FONT style="vertical-align: inherit;"&gt;&lt;FONT style="vertical-align: inherit;"&gt;L'opzione SORTSIZE è impostata su 32Gb, mentre la tabella da ordinare è 29Gb, quindi che senso ha aumentare il valore SORTSIZE?&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 09:45:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868167#M342923</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T09:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868169#M342925</link>
      <description>&lt;P&gt;PROC SORT can use up to 3-4 times the size of the input table in spill files etc.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 09:49:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868169#M342925</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2023-04-05T09:49:44Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868178#M342928</link>
      <description>&lt;P&gt;what does this mean? that you can set a SORTSIZE for example to 60Gb?&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 10:42:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868178#M342928</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T10:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868192#M342935</link>
      <description>&lt;P&gt;Yes, as long it's equal to or smaller than your MEMSIZE setting.&lt;/P&gt;
&lt;P&gt;But don't set it to a higher value than you have actual memory availble to you. Otherwise the OS will do the swapping. You want to SAS do the swapping since it know its internal processing better than the OS:&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 11:16:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868192#M342935</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2023-04-05T11:16:02Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868198#M342939</link>
      <description>&lt;P&gt;So I think I changed the SORTSIZE option. Could you tell me how to restore the previous option? Thank you...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 11:38:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868198#M342939</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T11:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868199#M342940</link>
      <description>Get rid of TAGSORT!&lt;BR /&gt;You only have 16 variables in the data set, and you need 13 of those as BY variables.  So TAGSORT saves reading 3 variables but adds the need to pass through the data twice.  As long as the sorting actually completes, removing TAGSORT should speed things up.</description>
      <pubDate>Wed, 05 Apr 2023 11:48:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868199#M342940</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2023-04-05T11:48:46Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868202#M342942</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I agree with Kurt's "&lt;SPAN&gt;Sorting 29 G in 8.5 minutes is not that bad".&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you want faster ,here is an example but need more code .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Female Male;
 set sashelp.heart;
 if sex='Female' then output Female;
  else if sex='Male' then output Male;
run;

data want;
 set Female Male;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;While this avoids the actual sort, it will still run the whole amount of data 4 times through I/O (same as PROC SORT without TAGSORT), and if the bottleneck is there, the gain will be minimal.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 12:04:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868202#M342942</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-04-05T12:04:04Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868224#M342948</link>
      <description>&lt;P&gt;even without TAGSORT the performance doesn't change much. even a little worse with the compressed dataset&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 13:08:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868224#M342948</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T13:08:36Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868256#M342970</link>
      <description>&lt;P&gt;Wow!&amp;nbsp; Thank you for testing it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That is so counter-intuitive (at least to me), that I wanted to confirm.&amp;nbsp; Was the test that you performed along these lines?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=_16_variables out=test1;
   by list of _13_ variables;
run;

proc sort data=_16_variables out=test2 tagsort;
   by list of _13_ variables;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 05 Apr 2023 14:19:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868256#M342970</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2023-04-05T14:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868258#M342972</link>
      <description>&lt;P&gt;I don't quite understand what the code you put means, it doesn't seem like a substitute for a proc sort&lt;/P&gt;
&lt;P&gt;this is my starting proc sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;proc sort data=_16_variables out=test1;
   by list of _13_ variables;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 14:22:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868258#M342972</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T14:22:57Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868259#M342973</link>
      <description>&lt;P&gt;This is a proc sort.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I didn't know the names of the 13 variables in your BY statement, so you would have to replace the BY statement with the actual variables you are using.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Rather than typing the name of your incoming data set, I used a made-up name _16_variables.&amp;nbsp; I expect you would use your actual data set name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I used out= to avoid replacing the original data set.&amp;nbsp; If you replace the original data set, the second PROC SORT would be skipped since SAS detects that the data set is already in sorted order.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So the idea is to run PROC SORT twice on the same data set, using out= to name the sorted data set.&amp;nbsp; One PROC SORT uses TAGSORT and one doesn't.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If that's not clear, please ask again.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 14:29:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868259#M342973</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2023-04-05T14:29:03Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868260#M342974</link>
      <description>&lt;P&gt;If you were sorting by sex (male/female in this data), that example code splits the data set into one for each sex and then stacks them. A manual sort essentially, which is the equivalent of&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=class;&lt;/P&gt;
&lt;P&gt;by sex;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/3574"&gt;@mariopellegrini&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I don't quite understand what the code you put means, it doesn't seem like a substitute for a proc sort&lt;/P&gt;
&lt;P&gt;this is my starting proc sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;proc sort data=_16_variables out=test1;
   by list of _13_ variables;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 14:31:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868260#M342974</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2023-04-05T14:31:15Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868267#M342979</link>
      <description>but I sort by 13 variables so it's more complex</description>
      <pubDate>Wed, 05 Apr 2023 14:38:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868267#M342979</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-05T14:38:45Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868295#M342991</link>
      <description>Not really, your combinations are more but the principle is the same and if it's automated it doesn't matter. Do you have indexes on any of those 13 variables?</description>
      <pubDate>Wed, 05 Apr 2023 15:49:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868295#M342991</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2023-04-05T15:49:54Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort large table</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868300#M342995</link>
      <description>&lt;P&gt;Are you able to post the actual log (not just the notes on the log) from the test you ran?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I understand you may not be able to do that because of confidentiality reasons, but post what you can.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2023 16:15:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-large-table/m-p/868300#M342995</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2023-04-05T16:15:53Z</dc:date>
    </item>
  </channel>
</rss>

