<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832826#M329218</link>
    <description>&lt;P&gt;You could help by providing more information.&amp;nbsp; For example, is an index your first choice, or is it your choice of last resort because PROC SORT is failing (or perhaps because this is a permanent SAS data set that you are not allowed to sort)?&amp;nbsp; What operating system are you using?&amp;nbsp; Do you plan to use this index multiple times, or is it for one-time use?&amp;nbsp; When you use the index, will you retrieve all the observations in sorted order (avoiding the need for PROC SORT), or will you retrieve a small subset of the observations each time?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's an old paper that applies mostly to the MVS operating system.&amp;nbsp; Maybe it would help, maybe not depending on how you would answer some of the questions above?&amp;nbsp;&amp;nbsp;&lt;A href="https://www.beoptimized.be/pdf/SUGI24_39.pdf" target="_blank"&gt;https://www.beoptimized.be/pdf/SUGI24_39.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But be aware that your worst fear might be that you are successful, but the index takes so long to retrieve the observations that it is not practical to use.&amp;nbsp; So any i information you can provide about how you would use the index can only help.&lt;/P&gt;</description>
    <pubDate>Mon, 12 Sep 2022 07:33:22 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2022-09-12T07:33:22Z</dc:date>
    <item>
      <title>how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832811#M329212</link>
      <description>&lt;P&gt;I used this code to create an index for the variable (recip) in data file dataset1:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data dataset1 (index=(recip));&lt;/P&gt;&lt;P&gt;set dataset1;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The SAS log reported:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 1,847,118,717 observations read from the data set Dataset1.&lt;/P&gt;&lt;P&gt;NOTE: The data set Dataset1 has 1,847,118,717 observations and 20 variables.&lt;/P&gt;&lt;P&gt;NOTE: Sort execution failure.&lt;/P&gt;&lt;P&gt;ERROR: insufficient space in file WORK. 'SASTMP-000000006'n.UTILITY.&lt;/P&gt;&lt;P&gt;NOTE: File&amp;nbsp;WORK. 'SASTMP-000000006'n.UTILITY is damaged. I/O processing did not complete.&lt;/P&gt;&lt;P&gt;WARNING: Limited resources when loading index RECIP for file Dataset1.index.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A solution to this problem would be appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 03:46:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832811#M329212</guid>
      <dc:creator>markgeier</dc:creator>
      <dc:date>2022-09-12T03:46:21Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832818#M329215</link>
      <description>&lt;P&gt;I would suggest using proc datasets, but i am not sure that the error message can be avoided. How much space is free on the drive used for work? Maybe using tagsort instead of an index could solve the problem.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 05:45:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832818#M329215</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2022-09-12T05:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832822#M329216</link>
      <description>&lt;P&gt;Is the actual goal here to sort the data set or to create an index?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 06:27:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832822#M329216</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2022-09-12T06:27:43Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832826#M329218</link>
      <description>&lt;P&gt;You could help by providing more information.&amp;nbsp; For example, is an index your first choice, or is it your choice of last resort because PROC SORT is failing (or perhaps because this is a permanent SAS data set that you are not allowed to sort)?&amp;nbsp; What operating system are you using?&amp;nbsp; Do you plan to use this index multiple times, or is it for one-time use?&amp;nbsp; When you use the index, will you retrieve all the observations in sorted order (avoiding the need for PROC SORT), or will you retrieve a small subset of the observations each time?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's an old paper that applies mostly to the MVS operating system.&amp;nbsp; Maybe it would help, maybe not depending on how you would answer some of the questions above?&amp;nbsp;&amp;nbsp;&lt;A href="https://www.beoptimized.be/pdf/SUGI24_39.pdf" target="_blank"&gt;https://www.beoptimized.be/pdf/SUGI24_39.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But be aware that your worst fear might be that you are successful, but the index takes so long to retrieve the observations that it is not practical to use.&amp;nbsp; So any i information you can provide about how you would use the index can only help.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 07:33:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832826#M329218</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2022-09-12T07:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832865#M329236</link>
      <description>&lt;P&gt;You are dealing with some serious data volumes here:&amp;nbsp;&lt;SPAN&gt;1,847,118,717 observations and 20 variables.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To be successful in dealing with such data you need some "advanced" understanding how SAS works and how to implement performant code.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just some tips "on the fly":&lt;/P&gt;
&lt;P&gt;- Delete all tables in WORK that you don't need anymore&lt;/P&gt;
&lt;P&gt;- Minimize passes through data&lt;/P&gt;
&lt;P&gt;- To just create an index don't use a data step that re-creates the table but use Proc Datasets instead&lt;/P&gt;
&lt;P&gt;- Reduce data volumes as early as possible (by dropping rows or aggregating the data)&lt;/P&gt;
&lt;P&gt;- Investigate if using the SPDE engine could be beneficial&lt;/P&gt;
&lt;P&gt;- Define variable lengths to the minimum required to store the data without truncation (numerical variables included).&lt;/P&gt;
&lt;P&gt;- Avoid any logic that creates high volume intermediary data (like some cartesian join - find an alternative way to get to the desired outcome)&lt;/P&gt;
&lt;P&gt;- etc.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 11:39:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832865#M329236</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-09-12T11:39:45Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832876#M329244</link>
      <description>&lt;P&gt;Move the dataset out of WORK (to a library on different physical storage) and create the index with PROC DATASETS (INDEX CREATE) or CREATE INDEX in PROC SQL.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Sep 2022 12:42:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/832876#M329244</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-09-12T12:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: how to avoid a sort execution failure when indexing a large data file (2 billion observations)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/834000#M329727</link>
      <description>&lt;P&gt;Thanks, Kurt. I deleted a bunch of files where my SAS Work file was being stored. That stopped the program from crashing. Also, I ran the program using the following program to create indexes, which ran successfully:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc datasets lib=A1;&lt;BR /&gt;modify Dataset1;&lt;BR /&gt;index create recip;&lt;BR /&gt;run;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Sep 2022 22:02:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/how-to-avoid-a-sort-execution-failure-when-indexing-a-large-data/m-p/834000#M329727</guid>
      <dc:creator>markgeier</dc:creator>
      <dc:date>2022-09-17T22:02:35Z</dc:date>
    </item>
  </channel>
</rss>

