<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sorting failure on a dataset with one million records 42 variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670957#M201453</link>
    <description>&lt;P&gt;I have been trying to sort a dataset and then dedup after that but I kept on getting this ERROR message even if I broke down the dataset to monthly and the size was down from 15 millions to 1.1 millions. Can anyone give me some suggestions? I also tried Linux server and still failed with the same error message.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC sort data=zz.Out2019_01 out=zz.Out2019_01a(compress=yes);&lt;BR /&gt;by cust_account_number dt_outage descending event_duration;run;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;ERROR: No disk space is available for the write operation. Filename =&lt;BR /&gt;C:\Users\wxu\AppData\Local\Temp\SAS Temporary&lt;BR /&gt;Files\SAS_util0001000033CC_D-6HPY9Z2\ut33CC000008.utl.&lt;BR /&gt;ERROR: Failure while attempting to write page 1851 of sorted run 124.&lt;BR /&gt;ERROR: Failure while attempting to write page 647355 to utility file 1.&lt;BR /&gt;ERROR: Failure encountered while creating initial set of sorted runs.&lt;BR /&gt;ERROR: Failure encountered during external sort.&lt;BR /&gt;ERROR: Sort execution failure.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Jul 2020 13:07:57 GMT</pubDate>
    <dc:creator>LisaXu</dc:creator>
    <dc:date>2020-07-21T13:07:57Z</dc:date>
    <item>
      <title>Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670957#M201453</link>
      <description>&lt;P&gt;I have been trying to sort a dataset and then dedup after that but I kept on getting this ERROR message even if I broke down the dataset to monthly and the size was down from 15 millions to 1.1 millions. Can anyone give me some suggestions? I also tried Linux server and still failed with the same error message.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC sort data=zz.Out2019_01 out=zz.Out2019_01a(compress=yes);&lt;BR /&gt;by cust_account_number dt_outage descending event_duration;run;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;ERROR: No disk space is available for the write operation. Filename =&lt;BR /&gt;C:\Users\wxu\AppData\Local\Temp\SAS Temporary&lt;BR /&gt;Files\SAS_util0001000033CC_D-6HPY9Z2\ut33CC000008.utl.&lt;BR /&gt;ERROR: Failure while attempting to write page 1851 of sorted run 124.&lt;BR /&gt;ERROR: Failure while attempting to write page 647355 to utility file 1.&lt;BR /&gt;ERROR: Failure encountered while creating initial set of sorted runs.&lt;BR /&gt;ERROR: Failure encountered during external sort.&lt;BR /&gt;ERROR: Sort execution failure.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 13:07:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670957#M201453</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-21T13:07:57Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670982#M201456</link>
      <description>&lt;P&gt;What is your goal here? Just deduping?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 13:37:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670982#M201456</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2020-07-21T13:37:21Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670986#M201457</link>
      <description>&lt;P&gt;I need to sort it first to have the event_duration longer listed first if there are multiple records for the same day. Then I'll keep the longest event for that day for that account. Thanks!!&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 13:39:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670986#M201457</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-21T13:39:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670989#M201458</link>
      <description>&lt;P&gt;Do you have long character variables, and is the dataset stored with compress=yes or compress=binary? If yes, use the TAGSORT option.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 13:49:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/670989#M201458</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-07-21T13:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671223#M201529</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree that TAGSORT is a likely solution for the reported problem.&amp;nbsp;&amp;nbsp; But why would the dataset compress status make the TAGSORT any more recommendable than if it were not compressed? &lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if the same disk is being used for the output dataset (libname zz) as the intermediate files (in C:\Users\wxu\AppData\Local\Temp\), then I see the benefit of careful choice of compress option on the output.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 21:41:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671223#M201529</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-07-21T21:41:37Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671225#M201530</link>
      <description>&lt;P&gt;Can you confirm that none of your 42 variables are needlessly long?&amp;nbsp; (e.g. 10 vars each 200 characters long, even though their values never need more than 10 characters)?&amp;nbsp; if so, you might be able to substantially reduce the size of your data set (and therefore avoid exceeding available disk space for intermediate sort files) without any loss of information.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 21:51:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671225#M201530</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-07-21T21:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671263#M201549</link>
      <description>&lt;P&gt;It's odd that the Windows server and the Unix server both run out of disk space.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are the disks full on both?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;TAGSORT is an option.&lt;/P&gt;
&lt;P&gt;Another thing you may try is to free space is enable compression on the &lt;EM&gt;utility&lt;/EM&gt;&amp;nbsp;folder&amp;nbsp;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;C:\Users\wxu\AppData\Local\Temp\SAS Temporary&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;Files,&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN style="font-family: inherit;"&gt;since the files stored there are not compressed.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 01:58:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671263#M201549</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-22T01:58:39Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671264#M201550</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt; I'll keep the longest event for that day for that account&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To dedupe, there is no need to use proc sort again; A data step will never fail and will probably be faster.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 02:03:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671264#M201550</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-22T02:03:31Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671278#M201552</link>
      <description>&lt;P&gt;Without tagsort, the utility file contains all data from the source, but the way it is built, overall I/O is kept to the bare minimum, so the sort is remarkably fast.&lt;/P&gt;
&lt;P&gt;But the utility file is uncompressed, so in the case of a compressed dataset with a high compression rate it will be much larger than the source; tagsort can be the only option to make the sort work, albeit with a performance penalty.&lt;/P&gt;
&lt;P&gt;That's why I would not recommend tagsort for uncompressed (or moderately compressed) datasets.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 07:26:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671278#M201552</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-07-22T07:26:31Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671348#M201582</link>
      <description>&lt;P&gt;Sorry that I missed so many responses as I was in a training.&amp;nbsp; Thanks so much for taking the time to respond to my question. To answer the questions:&lt;/P&gt;&lt;P&gt;I thought I don't have any super long variables but when I read the posts and checked them I found quite a few imported variables have $32767 length. Maybe this is why it takes so much space? I was watching the temp folder running out of space from 430G to finally the message "out of space". I was thinking how can 1G data sorting take that much of space. Maybe those variables are the reason? Any comments?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I reassigned the work library location and the sorting is done.&lt;/P&gt;&lt;P&gt;I also created 2 indices on two variables. I'll see whether I can run the whole year's data now with the length modified.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks so much for you all!&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 11:43:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671348#M201582</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-22T11:43:15Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671356#M201586</link>
      <description>&lt;P&gt;All the character variables from the data lake are defined $32767. Is there a quick way to find out what the true length for each of those 18 variables and then reassign the length? I used trim() but didn't change the format.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again!&lt;/P&gt;&lt;P&gt;Lisa&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 12:07:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671356#M201586</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-22T12:07:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671358#M201588</link>
      <description>&lt;P&gt;First of all, &lt;U&gt;&lt;EM&gt;before you do anything else&lt;/EM&gt;&lt;/U&gt;, take a good look at the content of those variables, and fix or amend your import process. Depending on the source type (Excel or text), you either have to add a correcting data step that reduces the length to what is actually needed, or you can set the correct length in the data step that reads your data. PROC IMPORT is good for a "first shot", but not for a really good (and consistent!) result.&lt;/P&gt;
&lt;P&gt;Some of these variables might even be missing for the whole dataset, so you should simply drop them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From your description, I am now very sure that what you experience is a consequence of the dataset being stored with the COMPRESS=YES option, and lots of overlong and sparsely populated variables. Compression rates of 99% and better can easily come from this, and the uncompressed utility file (either of a sort or a SQL) blows up your WORK/UTIL location, unless you use TAGSORT (which means that only the BY variable(s) and a record index are stored in the utility file).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, once again, get to know your data, improve/fix your import process, and if you still end up with large character variables and a good compression rate, use TAGSORT.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 12:16:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671358#M201588</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-07-22T12:16:50Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671359#M201589</link>
      <description>&lt;P&gt;Run this for a quick shot:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select
  max(length(a)) as max_a,
  max(length(b)) as max_b,
  /* and so on */
  max(length(x)) as max_x /* no comma here! */
from your_dataset;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then do this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data your_data_improved;
length /* set the lengths here according to what the SQL revealed */;
set your_data;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You will get a WARNING about truncated data, but you can safely ignore that, as (because of the SQL) you &lt;EM&gt;know what you are doing&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Alternatively, you can set appropriate lengths in the step that reads your data from the external source, if you do that with a data step (if you used PROC IMPORT for a text file (CSV or similar), you can extract this data step from the log and modify it to your needs).&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 12:25:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671359#M201589</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-07-22T12:25:55Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671360#M201590</link>
      <description>&lt;P&gt;Yes, I learned my lesson. I normally check the dataset before I work on it. This one I read from the data lake and it looked normal when I looked at it in SAS and I was in a rush to report some summaries to the team. So, it didn't register that the variables are so long. I'm working on a macro to adjust the length now and also adding tagsort. I did try tagsort before I posted the question. First time to see the variables are set for this length. I do appreciate your feedback, Kurt.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 12:26:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671360#M201590</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-22T12:26:44Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671361#M201591</link>
      <description>Thanks to you that I looked into the length of the variables. Appreciate it!</description>
      <pubDate>Wed, 22 Jul 2020 12:32:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671361#M201591</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-22T12:32:18Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting failure on a dataset with one million records 42 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671362#M201592</link>
      <description>I did add tagsort and changed the utility folder. Thanks a lot!</description>
      <pubDate>Wed, 22 Jul 2020 12:33:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-failure-on-a-dataset-with-one-million-records-42/m-p/671362#M201592</guid>
      <dc:creator>LisaXu</dc:creator>
      <dc:date>2020-07-22T12:33:25Z</dc:date>
    </item>
  </channel>
</rss>

