<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SQL sort execution failure with big data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790906#M253257</link>
    <description>&lt;P&gt;i am trying to use this technique in a subset of the original data , however i've the following issue&amp;nbsp;&lt;/P&gt;&lt;P&gt;WARNING: In a call to the CATX function, the buffer allocated for the result was not long enough to contain the concatenation of&lt;BR /&gt;all the arguments. The correct result would contain 336 characters, but the actual result might either be truncated to 200&lt;BR /&gt;character(s) or be completely blank, depending on the calling environment. The following note indicates the left-most&lt;BR /&gt;argument that caused truncation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i am using the macro below and having a lot of key variables instead of ID and Key.&lt;/P&gt;&lt;P&gt;Any idea how to support length ( greater than 200 ) ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%macro agg (in=, out=, N_segments=) ;&lt;BR /&gt;%local X seg ;&lt;BR /&gt;%let X=1+mod(input(char(MD5(catx(":",ID,Key)),10),pib1.),&amp;amp;N_segments) ;&lt;BR /&gt;%do seg = 1 %to &amp;amp;N_segments ;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table seg as&lt;BR /&gt;select ID, Key, sum(Var) as SUM, count (distinct Var) as UCOUNT&lt;BR /&gt;from &amp;amp;in where &amp;amp;X = &amp;amp;seg group ID, Key;&lt;BR /&gt;quit;&lt;BR /&gt;proc append base=&amp;amp;out data=seg; run;&lt;BR /&gt;%end ;&lt;BR /&gt;%mend ;&lt;BR /&gt;%agg (in=Trans, out=Agg, N_segments=3)&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jan 2022 15:10:33 GMT</pubDate>
    <dc:creator>bebess</dc:creator>
    <dc:date>2022-01-19T15:10:33Z</dc:date>
    <item>
      <title>SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790841#M253228</link>
      <description>&lt;P&gt;Hi all ,&lt;/P&gt;&lt;P&gt;i am having an issue when aggregating result from a table using an SQL procedure that perform basic sum calculations and a group by .&amp;nbsp;&lt;/P&gt;&lt;P&gt;i am having the issue with a big dataset 177GB ( 251 951 350 observations and 138 variables), below the error message :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: At least one nonessential grouping column reference has been removed from a GROUP BY's reference list.&lt;BR /&gt;WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity&lt;BR /&gt;problem.&lt;BR /&gt;ERROR: Sort execution failure.&lt;/P&gt;&lt;P&gt;NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.&lt;BR /&gt;MPRINT(AGGREGATE): QUIT;&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;real time 1:08:47.47&lt;BR /&gt;user cpu time 11:37.53&lt;BR /&gt;system cpu time 2:51.71&lt;BR /&gt;memory 1056479.00k&lt;BR /&gt;OS Memory 1076112.00k&lt;BR /&gt;Timestamp 01/18/2022 08:53:46 PM&lt;BR /&gt;Step Count 172 Switch Count 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i know te first note refers to&amp;nbsp;&lt;A href="https://support.sas.com/kb/16/074.html" target="_blank"&gt;16074 - "WARNING: At least one nonessential grouping column reference has been removed from a GROUP BY's reference list" (sas.com)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The other point is that i'm overwritting the table by using the same name.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i'm sure the SQL procedure is using more space than the total space of the table because i'm having 2TB of free space in my Work session and i've noticed a huge consumption of space,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i am using compress=Yes option and the UTILOC refers to my work drive&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My questions are :&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. How much space the SQL procedure needs ( 4 times the space of the data ? )&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. is there an alternative to my query&amp;nbsp; ( that will not&amp;nbsp; over consume memory or disk space ) ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table xxx as&amp;nbsp;&lt;/P&gt;&lt;P&gt;select "result" as id,&lt;/P&gt;&lt;P&gt;list_of_var,&lt;/P&gt;&lt;P&gt;sum(var_n) as var_new&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;from xxx&lt;/P&gt;&lt;P&gt;group by id, list_of_var&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 10:26:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790841#M253228</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T10:26:52Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790843#M253229</link>
      <description>&lt;P&gt;Do you have cas language avaible, do you have sas viya?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 10:34:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790843#M253229</guid>
      <dc:creator>acordes</dc:creator>
      <dc:date>2022-01-19T10:34:44Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790849#M253230</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) How many different "by groups" you have in that dataset? are they milions or maybe tousands?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) A "quick try" could be to use Proc Sort on the data set with TAGSORT option and then use data step by-group processing to sum up your data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 11:16:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790849#M253230</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2022-01-19T11:16:21Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790850#M253231</link>
      <description>No it's SAS Enterprise Guide 7.15 , servers hosted in Azure</description>
      <pubDate>Wed, 19 Jan 2022 11:19:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790850#M253231</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T11:19:48Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790851#M253232</link>
      <description>1) i expect to have around 190 348 410 obs for the aggregated table.&lt;BR /&gt;2 ) will have a look on that &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Wed, 19 Jan 2022 11:26:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790851#M253232</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T11:26:13Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790856#M253235</link>
      <description>&lt;P&gt;Look at this video and article by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;&amp;nbsp;, it may give you some hints how to process&amp;nbsp; your data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Uniform-Hashing-of-Arbitrary-Input-Into-Key-Exclusive-Segments/ta-p/735358" target="_blank"&gt;https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Uniform-Hashing-of-Arbitrary-Input-Into-Key-Exclusive-Segments/ta-p/735358&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 11:48:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790856#M253235</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2022-01-19T11:48:25Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790859#M253236</link>
      <description>&lt;P&gt;Sorting - usually y9u need around 2-3 times the size of the table in your UTILLOC. Assuming it has the same location as your saswork?&lt;/P&gt;
&lt;P&gt;2TB should be enough, have monitored during execution?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can see that SQL "only" uses 1GB om RAM. If you have more available to you, maximize MEMSIZE and SORTSIZE options.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;TAGSORT is resource effective, but it will take substantiable longer time than the default sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And I guess that you really need all input columns as output?&lt;/P&gt;
&lt;P&gt;And the data is in a Base SAS library?&lt;/P&gt;
&lt;P&gt;Consider using SPDE. It might not solve your current issue, but for large data sets it faster for many use cases. Then you should take a look at the SPDESORTSIZE option.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 11:56:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790859#M253236</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2022-01-19T11:56:22Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790866#M253238</link>
      <description>&lt;P&gt;In an "ideal sort", you need three times the size of the table&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;original table&lt;/LI&gt;
&lt;LI&gt;utility file&lt;/LI&gt;
&lt;LI&gt;resulting table&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If the original table is compressed (dataset or system option COMPRESS=yes), the utility file will be larger, sometimes MUCH larger (think of a compression ratio of 95%, the utility file will be 20 times as large). In this case, use the TAGSORT option of PROC SORT. Since this option is not available in SQL, you are better off running your summation in two steps:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use PROC SORT with BY&amp;nbsp;&lt;SPAN&gt;id list_of_var&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use PROC SUMMARY with the same BY&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;If there are more variables in your source dataset than those used in BY and for the summation, drop all other variables when sorting, and create an intermediary table:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort
  data=xxx (keep=list_of_var var_n)
  out=yyy
  /* (compress=yes)  tagsort if long character variables exist */
;
by list_of_var;
run;

proc summary data=yyy;
by list_of_var;
var var_n;
output out=zzz sum()=;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;You also seem to have non-adequate storage or other CPU-consuming processes, recognizable here:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;real time 1:08:47.47
user cpu time 11:37.53
system cpu time 2:51.71&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Your real time is more than 4 times the CPU time, which points either to wait states caused by the storage, or contention for CPU resources.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 12:36:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790866#M253238</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-19T12:36:37Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790880#M253244</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="bebess_0-1642596997952.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/67501iDD6FBCF2211B45DD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="bebess_0-1642596997952.png" alt="bebess_0-1642596997952.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;only this job was running on the server and i've seen the error in this time .&lt;/P&gt;&lt;P&gt;i need 90% of columns from the table but as i 'm also created new calculated variables so at the end it's like i am having same number of columns but with less observations .&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes it's a BASE SAS library , a basic SAS table&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 13:02:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790880#M253244</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T13:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790906#M253257</link>
      <description>&lt;P&gt;i am trying to use this technique in a subset of the original data , however i've the following issue&amp;nbsp;&lt;/P&gt;&lt;P&gt;WARNING: In a call to the CATX function, the buffer allocated for the result was not long enough to contain the concatenation of&lt;BR /&gt;all the arguments. The correct result would contain 336 characters, but the actual result might either be truncated to 200&lt;BR /&gt;character(s) or be completely blank, depending on the calling environment. The following note indicates the left-most&lt;BR /&gt;argument that caused truncation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i am using the macro below and having a lot of key variables instead of ID and Key.&lt;/P&gt;&lt;P&gt;Any idea how to support length ( greater than 200 ) ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%macro agg (in=, out=, N_segments=) ;&lt;BR /&gt;%local X seg ;&lt;BR /&gt;%let X=1+mod(input(char(MD5(catx(":",ID,Key)),10),pib1.),&amp;amp;N_segments) ;&lt;BR /&gt;%do seg = 1 %to &amp;amp;N_segments ;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table seg as&lt;BR /&gt;select ID, Key, sum(Var) as SUM, count (distinct Var) as UCOUNT&lt;BR /&gt;from &amp;amp;in where &amp;amp;X = &amp;amp;seg group ID, Key;&lt;BR /&gt;quit;&lt;BR /&gt;proc append base=&amp;amp;out data=seg; run;&lt;BR /&gt;%end ;&lt;BR /&gt;%mend ;&lt;BR /&gt;%agg (in=Trans, out=Agg, N_segments=3)&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 15:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790906#M253257</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T15:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790914#M253259</link>
      <description>&lt;P&gt;Maxim 1.&lt;/P&gt;
&lt;P&gt;From the documentation of the &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0p7wxtk0hvn83n1pveisbcp2ae9.htm" target="_blank" rel="noopener"&gt;CATX Function&lt;/A&gt;:&lt;/P&gt;
&lt;H3 id="n1wudi40ycsm3jn1gu30ja4v2mta" class="xisDoc-title"&gt;Length of Returned Variable: Special Cases&lt;/H3&gt;
&lt;P class="xisDoc-paragraph"&gt;The CATX function returns a value to a variable or returns a value in a temporary buffer. The value that is returned from the CATX function has one of these lengths:&lt;/P&gt;
&lt;UL class="xisDoc-listUnordered"&gt;
&lt;LI class="xisDoc-item"&gt;up to 200 characters in WHERE clauses and in PROC SQL&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try to use PROC SORT and PROC SUMMARY/MEANS (as I already suggested) first on the whole dataset instead of resorting to splitting it up.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 15:30:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790914#M253259</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-19T15:30:50Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790916#M253260</link>
      <description>ok will try &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;</description>
      <pubDate>Wed, 19 Jan 2022 15:36:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/790916#M253260</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-19T15:36:31Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791084#M253331</link>
      <description>The sort takes too long time, any idea how to solve the issue of having real time very higher to the CPU + system times ?&lt;BR /&gt;&lt;BR /&gt;The summary time is more acceptable .&lt;BR /&gt;&lt;BR /&gt;NOTE: There were 251951350 observations read from the data set TEMP.EA_RESULT_4491.&lt;BR /&gt;NOTE: SAS threaded sort was used.&lt;BR /&gt;NOTE: The data set WORK.YYY has 251951350 observations and 82 variables.&lt;BR /&gt;NOTE: Compressing data set WORK.YYY decreased size by 85.55 percent.&lt;BR /&gt;Compressed is 910365 pages; un-compressed would require 6298784 pages.&lt;BR /&gt;NOTE: PROCEDURE SORT used (Total process time):&lt;BR /&gt;real time 9:14:07.85&lt;BR /&gt;user cpu time 50:21.95&lt;BR /&gt;system cpu time 14:06.46&lt;BR /&gt;memory 1059797.87k&lt;BR /&gt;OS Memory 1081344.00k&lt;BR /&gt;Timestamp 01/20/2022 02:07:44 AM&lt;BR /&gt;Step Count 71 Switch Count 50140&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;NOTE: There were 251951350 observations read from the data set WORK.YYY.&lt;BR /&gt;NOTE: The data set TEMP.ZZZ has 243403020 observations and 82 variables.&lt;BR /&gt;NOTE: Compressing data set TEMP.ZZZ decreased size by 85.48 percent.&lt;BR /&gt;Compressed is 883779 pages; un-compressed would require 6085076 pages.&lt;BR /&gt;NOTE: PROCEDURE SUMMARY used (Total process time):&lt;BR /&gt;real time 1:10:18.26&lt;BR /&gt;user cpu time 39:05.53&lt;BR /&gt;system cpu time 4:26.67&lt;BR /&gt;memory 3041.71k&lt;BR /&gt;OS Memory 24680.00k&lt;BR /&gt;Timestamp 01/20/2022 03:18:02 AM&lt;BR /&gt;Step Count 72 Switch Count 6666&lt;BR /&gt;</description>
      <pubDate>Thu, 20 Jan 2022 07:55:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791084#M253331</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-20T07:55:50Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791096#M253339</link>
      <description>&lt;P&gt;In both cases you are clearly I/O bound, especially during the sort.&lt;/P&gt;
&lt;P&gt;Where is your TEMP library located?&lt;/P&gt;
&lt;P&gt;Try to use the TAGSORT option in PROC SORT, this will at least reduce the size of the utility file, &lt;U&gt;&lt;STRONG&gt;but&lt;/STRONG&gt;&lt;/U&gt; if your real bottleneck is TEMP, then it will cause even longer time (as the source file needs to be read twice, and random at that).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;All your working datasets (not something you keep as an "archive" and access very rarely) should be kept on local disks or disks accessed through a high-speed SAN. SAS is I/O intensive and needs top-level storage with large amounts of data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bottom-line: for more in-depth help, we need to know the details of your SAS setup (operating system - virtual machine?, type and number of processors available, layout of disk storage and location of libraries in there).&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jan 2022 09:04:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791096#M253339</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-20T09:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791137#M253356</link>
      <description>&lt;P&gt;TEMP refers is under Azure location like SAS servers but probably not in the same place exactly ( will check this point )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sas guide version :&amp;nbsp;&amp;nbsp;7.15 HF3 (7.100.5.6132) (64-bit)&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Virtual machine :&amp;nbsp;Windows (Windows Server 2016 Datacenter)&lt;/P&gt;&lt;P&gt;Standard D16as v4 (16 virtual processors, 64 Gio of memory)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Group=PERFORMANCE&lt;BR /&gt;ARMAGENT= Specifies an ARM agent (which is an executable module or keyword, such as LOG4SAS) that contains a specific&lt;BR /&gt;implementation of the ARM API.&lt;BR /&gt;ARMLOC=ARMLOG.LOG Specifies the location of the ARM log.&lt;BR /&gt;ARMSUBSYS=(ARM_NONE)&lt;BR /&gt;Specifies the SAS ARM subsystems to enable or disable.&lt;BR /&gt;BUFNO=1 Specifies the number of buffers for processing SAS data sets.&lt;BR /&gt;BUFSIZE=0 Specifies the size of a buffer page for output SAS data sets.&lt;BR /&gt;CGOPTIMIZE=3 Specifies the level of optimization to perform during code compilation.&lt;BR /&gt;CMPMODEL=BOTH Specifies the output model type for the MODEL procedure.&lt;BR /&gt;CMPOPT=(NOEXTRAMATH NOMISSCHECK NOPRECISE NOGUARDCHECK NOGENSYMNAMES NOFUNCDIFFERENCING SHORTCIRCUIT NOPROFILE NODEBUGHOST&lt;BR /&gt;NODEBUGPORT)&lt;BR /&gt;Specifies the type of code-generation optimizations to use in the SAS language compiler.&lt;BR /&gt;COMPRESS=YES Specifies the type of compression to use for observations in output SAS data sets.&lt;BR /&gt;CPUCOUNT=4 Specifies the number of processors that thread-enabled applications should assume are available for concurrent&lt;BR /&gt;processing.&lt;BR /&gt;NODBIDIRECTEXEC The SQL pass-through facility does not optimize the handling of SQL statements.&lt;BR /&gt;DBSLICEPARM=(THREADED_APPS, 2)&lt;BR /&gt;Specifies whether SAS procedures, applications, and the DATA step can read DBMS tables in parallel, and the&lt;BR /&gt;number of threads to use to read the DBMS tables.&lt;BR /&gt;DS2ACCEL=NONE Provides support for DS2 code pass-through acceleration.&lt;BR /&gt;DSACCEL=NONE Provides support for code pass-through acceleration.&lt;BR /&gt;HADOOPPLATFORM=MAPRED&lt;BR /&gt;Specifies the execution platform for the SAS In-Database Code Accelerator for Hadoop.&lt;BR /&gt;2 The SAS System 09:27 Wednesday, January 19, 2022&lt;/P&gt;&lt;P&gt;MAXSEGRATIO=75 Specifies the upper limit for the percentage of index segments that the SPD Engine identifies as containing the&lt;BR /&gt;value referenced in the WHERE expression.&lt;BR /&gt;MEXECSIZE=65536 Specifies the maximum macro size that can be executed in memory.&lt;BR /&gt;MINPARTSIZE=16777216&lt;BR /&gt;Specifies the minimum size of the data component partitions for SPD Engine data sets.&lt;BR /&gt;SORTSIZE=1073741824&lt;BR /&gt;Specifies the amount of memory that is available to the SORT procedure.&lt;BR /&gt;NOSPDEFILECACHE Disables caching of opened SPD Engine files.&lt;BR /&gt;SPDEINDEXSORTSIZE=33554432&lt;BR /&gt;Specifies the memory size for sorting index values.&lt;BR /&gt;SPDEMAXTHREADS=0 Specifies the maximum number of threads that the SPD Engine can spawn for I/O processing.&lt;BR /&gt;SPDEPARALLELREAD=NO&lt;BR /&gt;Enables or disables SPD Engine parallel reads when no WHERE clause is in effect.&lt;BR /&gt;SPDESORTSIZE=33554432&lt;BR /&gt;Specifies the memory size that is used for sorting by the SPD Engine.&lt;BR /&gt;SPDEUTILLOC= Specifies one or more locations where the SPD Engine can temporarily store utility files.&lt;BR /&gt;SPDEWHEVAL=COST Specifies the WHERE statement evaluation process for the SPD Engine.&lt;BR /&gt;SQLGENERATION=(NONE DBMS='TERADATA DB2 ORACLE NETEZZA ASTER GREENPLM HADOOP SAPHANA IMPALA HAWQ POSTGRES REDSHIFT SQLSVR VERTICA')&lt;BR /&gt;Specifies whether and when SAS procedures generate SQL for in-database processing of source data.&lt;BR /&gt;SQLREDUCEPUT=DBMS For PROC SQL, specifies the engine type to use to optimize a PUT function in a query.&lt;BR /&gt;SQLREDUCEPUTOBS=0 For PROC SQL, specifies the minimum number of observations that must be in a table for PROC SQL to optimize the&lt;BR /&gt;PUT function in a query.&lt;BR /&gt;SQLREDUCEPUTVALUES=0&lt;BR /&gt;For PROC SQL, specifies the maximum number of SAS format values that can exist in a PUT function expression to&lt;BR /&gt;optimize the PUT function in a query.&lt;BR /&gt;STRIPESIZE= Specifies path and size pairs to identify I/O device stripe size. Stripe size indicates page size when creating&lt;BR /&gt;a data set or utility file.&lt;BR /&gt;THREADS Uses threaded processing for SAS applications that support it.&lt;BR /&gt;UBUFNO=0 Specifies the number of utility file buffers.&lt;BR /&gt;UBUFSIZE=0 Specifies the size of utility file buffers.&lt;BR /&gt;UTILLOC=WORK Specifies one or more file system locations in which threaded applications can store utility files.&lt;BR /&gt;VBUFSIZE=65536 Specifies the buffer size for a view.&lt;BR /&gt;&lt;BR /&gt;SLEEPCNTL=NO Specifies whether SAS prevents Windows from going into sleep mode.&lt;BR /&gt;ALIGNSASIOFILES Aligns SAS files on a page boundary for improved performance.&lt;BR /&gt;MEMSIZE=68947703040&lt;BR /&gt;Specifies the limit on the amount of virtual memory that can be used during a SAS session.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jan 2022 13:36:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791137#M253356</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-20T13:36:51Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791139#M253357</link>
      <description>&lt;P&gt;If your SAS processing happens in the same Azure instance, that would be fine, but if you have in fact an external network connection from the SAS server to it (read: over the internet or similar), then that will slow it down considerably.&lt;/P&gt;
&lt;P&gt;Download the whole dataset once to your local environment, and proceed from there.&lt;/P&gt;
&lt;P&gt;Is your Windows server that runs SAS located in the Azure cloud?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jan 2022 13:49:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791139#M253357</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-20T13:49:42Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791161#M253367</link>
      <description>Yes Windows server that runs SAS is in Azure cloud&lt;BR /&gt;It looks like the threaded procedures ( SQL , SORT ...etc) takes much more time , because another data step in my process shows more acceptable time without a big difference between real time and CPU time&lt;BR /&gt;NOTE: There were 140027790 observations read from the data set WORK. MYSASTABLE.&lt;BR /&gt;NOTE: The data set WORK.MYSASTABLE has 251951350 observations and 131 variables.&lt;BR /&gt;NOTE: Compressing data set WORK. MYSASTABLE decreased size by 81.14 percent.&lt;BR /&gt;Compressed is 1187708 pages; un-compressed would require 6298784 pages.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 1:00:03.45&lt;BR /&gt;user cpu time 53:12.29&lt;BR /&gt;system cpu time 1:59.54&lt;BR /&gt;memory 2564.81k&lt;BR /&gt;OS Memory 22720.00k&lt;BR /&gt;Timestamp 01/18/2022 03:27:44 PM&lt;BR /&gt;Step Count 77 Switch Count 0&lt;BR /&gt;</description>
      <pubDate>Thu, 20 Jan 2022 14:40:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791161#M253367</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-20T14:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791181#M253373</link>
      <description>&lt;P&gt;A data step that reads and writes from/to a compressed dataset will produce much less I/O than a sort and any operation in SQL that requires a utility file, as the utility files will be uncompressed. Show us the log of this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data work.test1(compress=yes);
set temp.ea_result_4491;
run;

data work.test2 (compress=yes);
set work.test1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You will then see the relative performance between the two libraries, and if (as I suspect) reading from TEMP is significantly slower, then you need to contemplate moving as much data as possible to the SAS server (and keeping it there), or get in touch with the people responsible for setting up that library storage-wise.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jan 2022 15:58:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791181#M253373</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-20T15:58:32Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791352#M253462</link>
      <description>i've tested the sort procedure having the table in my Work sesson ( local ) and i haven't notice an improvment of performance, i've still a big difference cpu and real time , the sort takes around 8 hours !&lt;BR /&gt;do you think it can improve performance if i use option SPDESORTSIZE , actually i can see the value is very low , what do you think ?&lt;BR /&gt;&lt;BR /&gt;SPDESORTSIZE=33554432&lt;BR /&gt;Specifies the memory size that is used for sorting by the SPD Engine.</description>
      <pubDate>Fri, 21 Jan 2022 08:10:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791352#M253462</guid>
      <dc:creator>bebess</dc:creator>
      <dc:date>2022-01-21T08:10:48Z</dc:date>
    </item>
    <item>
      <title>Re: SQL sort execution failure with big data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791396#M253478</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/55544"&gt;@bebess&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;i've tested the sort procedure having the table in my Work sesson ( local ) and i haven't notice an improvment of performance, i've still a big difference cpu and real time , the sort takes around 8 hours !&lt;BR /&gt;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Which means that your storage is not up to the task. Get in touch with your SAS administrators, they need to improve the performance if you need to work with such data sizes regularly.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jan 2022 12:26:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SQL-sort-execution-failure-with-big-data/m-p/791396#M253478</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-01-21T12:26:15Z</dc:date>
    </item>
  </channel>
</rss>

