<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PROC SORT WITH LARGE DATASET in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871155#M344107</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;follow &lt;A href="https://documentation.sas.com/doc/en/etlug/4.904/n139ymobeacculn1gnz7q6fyfior.htm" target="_blank" rel="noopener"&gt;this guide&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 21 Apr 2023 14:39:24 GMT</pubDate>
    <dc:creator>Oligolas</dc:creator>
    <dc:date>2023-04-21T14:39:24Z</dc:date>
    <item>
      <title>PROC SORT WITH LARGE DATASET</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871151#M344103</link>
      <description>&lt;P&gt;In general, if a large table is reduced a little by COMPRESS, the PROC SORT applied to it does not improve much, even with the TAGSORT or PRESORTED options. At this point what should be done to reduce the processing time of a proc sort?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;this is the result respectively without COMPRESS and with COMPRESS applied to table&amp;nbsp;tab1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;proc sort data = tab1&lt;BR /&gt;out = tab_out(keep=cod_source cod_compagnia&lt;BR /&gt;cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita&lt;BR /&gt;imp_riserva&lt;BR /&gt;cod_divisa dt_carico cod_tipo_agg_riserva) TAGSORT;&lt;BR /&gt;by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva&lt;BR /&gt;ts_fine_validita;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;NOTE: Tagsort reads each observation of the input data set twice.&lt;BR /&gt;NOTE: The data set tab_out has 301087776 observations and 13 variables.&lt;BR /&gt;NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):&lt;BR /&gt;real time 7:19.48&lt;BR /&gt;user cpu time 4:35.45&lt;BR /&gt;system cpu time 44.12 seconds&lt;BR /&gt;memory 20087284.06k&lt;BR /&gt;OS Memory 20111168.00k&lt;BR /&gt;Timestamp 21/04/2023 02:51:28 p.&lt;BR /&gt;Step Count 17 Switch Count 693&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data = tab1 out = tab_out(keep=cod_source cod_compagnia&lt;BR /&gt;cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva ts_inizio_validita ts_fine_validita&lt;BR /&gt;imp_riserva cod_divisa dt_carico cod_tipo_agg_riserva);&lt;BR /&gt;by cod_source cod_compagnia cod_contratto num_posizione cod_garanzia cod_tipo_riserva dt_effetto_riserva&lt;BR /&gt;ts_fine_validita;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 301087776 observations read from the data set tab1.&lt;BR /&gt;NOTE: The data set tab_out has 301087776 observations and 13 variables.&lt;BR /&gt;NOTE: PROCEDURE SORT ha utilizzato (tempo totale di elaborazione):&lt;BR /&gt;real time 8:14.81&lt;BR /&gt;user cpu time 2:47.42&lt;BR /&gt;system cpu time 1:03.64&lt;BR /&gt;memory 31707434.82k&lt;BR /&gt;OS Memory 31729708.00k&lt;BR /&gt;Timestamp 21/04/2023 03:03:25 p.&lt;BR /&gt;Step Count 19 Switch Count 613&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2023 14:22:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871151#M344103</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2023-04-21T14:22:14Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SORT WITH LARGE DATASET</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871155#M344107</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;follow &lt;A href="https://documentation.sas.com/doc/en/etlug/4.904/n139ymobeacculn1gnz7q6fyfior.htm" target="_blank" rel="noopener"&gt;this guide&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2023 14:39:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871155#M344107</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2023-04-21T14:39:24Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SORT WITH LARGE DATASET</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871156#M344108</link>
      <description>Two items stick out like a sore thumb. First, your program sorts all the variables, then outputs the KEEP= subset. Apply KEEP= to the incoming data rather than the output. Second, TAGSORT doesn't help. If anything, it increases the time required. You can see the note on the log telling you that SAS had to process the data twice. That's because of TAGSORT. As a general rule, eliminate TAGSORT unless you are sorting so many variables that you can't find the memory that PROC SORT needs in order to complete.&lt;BR /&gt;&lt;BR /&gt;Added:  notice that the link provided by the previous poster agrees that TAGSORT often drastically increases the time that PROC SORT takes.</description>
      <pubDate>Fri, 21 Apr 2023 14:51:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871156#M344108</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2023-04-21T14:51:05Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SORT WITH LARGE DATASET</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871161#M344109</link>
      <description>&lt;P&gt;Hi &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/3574"&gt;@mariopellegrini&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just from mechanics perspective,&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The smaller the record length&lt;/STRONG&gt; --&amp;gt; the more records can fit into the buffer (Memory) --&amp;gt; the less times data needs to fetched from Disk into Memory --&amp;gt; the faster processing is finished &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, as you can see, it all starts "Smaller Record Length". Here is a good SGF paper that gives you strategies/techniques/methods to reduce the Record length beside Compression.&amp;nbsp; &lt;A title="Twenty Ways to Run Your SAS® Program Faster and Use Less Space" href="https://support.sas.com/resources/papers/proceedings20/4713-2020.pdf" target="_blank" rel="noopener"&gt;Twenty Ways to Run Your SAS® Program Faster and Use Less Space&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I personally follow these points in my code&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;5.&lt;/SPAN&gt; &lt;SPAN&gt;Use the LENGTH command to define the length of character and numeric &lt;/SPAN&gt;&lt;SPAN&gt;variables.&lt;/SPAN&gt; &lt;SPAN&gt;This can achieve a significant reduction in the space used by the &lt;/SPAN&gt;&lt;SPAN&gt;program.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;6.&lt;/SPAN&gt; &lt;SPAN&gt;Numeric variables in SAS data sets have a default length of 8.&lt;/SPAN&gt; &lt;SPAN&gt;If the values &lt;/SPAN&gt;&lt;SPAN&gt;of the numeric variable are all integers, you can reduce the space by using&lt;/SPAN&gt; &lt;SPAN&gt;the following table.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE id="n1mxiw7j4pg52an1kgtp0d2lxxbo" class="xisDoc-table"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TH class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;Length in Bytes&lt;/P&gt;
&lt;/TH&gt;
&lt;TH class="xisDoc-horizontalRight xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;Largest Integer Represented Exactly&lt;/P&gt;
&lt;/TH&gt;
&lt;TH class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;Exponential Notation&lt;/P&gt;
&lt;/TH&gt;
&lt;TH class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;Significant Digits Retained&lt;/P&gt;
&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;3&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;8,192&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;13&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalBottom"&gt;
&lt;P class="xisDoc-paragraph"&gt;3&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;4&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2,097,152&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;21&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;6&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;5&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;536,870,912&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;29&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;8&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;6&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;137,438,953,472&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;37&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;11&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;7&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;35,184,372,088,832&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;45&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;13&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;8&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalRight xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;9,007,199,254,740,992&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;2&lt;SUP class="xisDoc-superscript"&gt;53&lt;/SUP&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD class="xisDoc-horizontalLeft xisDoc-verticalTop"&gt;
&lt;P class="xisDoc-paragraph"&gt;15&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The third column refers to the absolute value of the &lt;/SPAN&gt;&lt;SPAN&gt;number.&lt;/SPAN&gt; &lt;SPAN&gt;Calculate the largest value of the numeric variable, check to make &lt;/SPAN&gt;&lt;SPAN&gt;sure all values are integers by comparing the variable’s v&lt;/SPAN&gt;&lt;SPAN&gt;alue to the value &lt;/SPAN&gt;&lt;SPAN&gt;calculated with the ROUND function, and then, if the variables are all &lt;/SPAN&gt;&lt;SPAN&gt;integers, use the table below to determine the smallest length required.&lt;/SPAN&gt; &lt;SPAN&gt;More details&lt;/SPAN&gt;&lt;SPAN&gt; can be found in &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostwin/n04ccixfia6l2pn1f8szvttqg3hm.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostwin/n04ccixfia6l2pn1f8szvttqg3hm.htm&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;7.&lt;/SPAN&gt; &lt;SPAN&gt;Sometimes character variables imported into SAS from other systems, like &lt;/SPAN&gt;&lt;SPAN&gt;Oracle or Excel, have very large lengths.&lt;/SPAN&gt; &lt;SPAN&gt;You can use the following &lt;/SPAN&gt;&lt;SPAN&gt;procedure to get the shortest possible length for your character variable, &lt;/SPAN&gt;&lt;SPAN&gt;although you might want to allow room for growth:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN&gt;a.&lt;/SPAN&gt; &lt;SPAN&gt;Use the LENGTH function to calculate the actual length of the variable &lt;/SPAN&gt;&lt;SPAN&gt;in each observation in the data set.&lt;/SPAN&gt;&lt;BR role="presentation" /&gt;&lt;SPAN&gt;b.&lt;/SPAN&gt; &lt;SPAN&gt;Use the MAX option in PROC SUMMARY to get the largest value of the&lt;/SPAN&gt; &lt;SPAN&gt;length.&lt;/SPAN&gt;&lt;BR role="presentation" /&gt;&lt;SPAN&gt;c.&lt;/SPAN&gt; &lt;SPAN&gt;Use the LENGTH statement to shorten the length of the character&lt;/SPAN&gt; &lt;SPAN&gt;variable to the maximum length.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;8.&lt;/SPAN&gt; &lt;SPAN&gt;Switch variables from numeric to character if they are integers and range in &lt;/SPAN&gt;&lt;SPAN&gt;value from -9 to 99.&lt;/SPAN&gt; &lt;SPAN&gt;The minimum length for numeric variables is 3, so you &lt;/SPAN&gt;&lt;SPAN&gt;can save space if the variable can fit into one or two characters.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;9.&lt;/SPAN&gt; &lt;SPAN&gt;Switch variables from character to numeric if they are all integers and occupy &lt;/SPAN&gt;&lt;SPAN&gt;more than 3 bytes.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;For example, the number 1234 would occupy 4 bytes as &lt;/SPAN&gt;&lt;SPAN&gt;a character variable but item 6 above shows it would only occupy 3 bytes as &lt;/SPAN&gt;&lt;SPAN&gt;a numeric variable.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2023 15:13:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871161#M344109</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2023-04-21T15:13:21Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SORT WITH LARGE DATASET</title>
      <link>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871199#M344124</link>
      <description>&lt;UL&gt;
&lt;LI&gt;Move the KEEP= option to the input dataset&lt;/LI&gt;
&lt;LI&gt;Use TAGSORT only if you have a heavily COMPRESSed, large dataset&lt;/LI&gt;
&lt;LI&gt;Use COMPRESS=YES on the output dataset if it will contain large character variables which are mostly empty&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 21 Apr 2023 17:05:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/PROC-SORT-WITH-LARGE-DATASET/m-p/871199#M344124</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-04-21T17:05:27Z</dc:date>
    </item>
  </channel>
</rss>

