<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sorting large datasets on multiple variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308085#M66063</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/83078"&gt;@SuryaKiran&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;The macro code you provided creates 5x19 observations. Its not subsetting the data. It is just repeating 5 times.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You completely missed what I was pointing at. I just wanted to illustrate that using a set and by statement to combine separately sorted datasets will preserve the correct sort order in the output dataset.&lt;/P&gt;
&lt;P&gt;Please observe that this discussion thread deals with &lt;U&gt;sorting&lt;/U&gt;, not subsetting.&lt;/P&gt;</description>
    <pubDate>Sat, 29 Oct 2016 11:06:14 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2016-10-29T11:06:14Z</dc:date>
    <item>
      <title>Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307788#M65972</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to sort a large dataset on 4 variables. The dataset is 100GB and it is using up the entire work space in the background and is unable to complete the process. Is there any efficient way other than the traditional proc sort or proc sql. I have tried both.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SORT DATA = xyz;&lt;/P&gt;&lt;P&gt;BY a b c d;&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SQL;&lt;/P&gt;&lt;P&gt;CREATE TABLE xyz1&lt;/P&gt;&lt;P&gt;SELECT *&lt;/P&gt;&lt;P&gt;FROM XYZ&lt;/P&gt;&lt;P&gt;ORDER BY a,b,c,d;&lt;/P&gt;&lt;P&gt;QUIT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your time&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 00:05:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307788#M65972</guid>
      <dc:creator>div44</dc:creator>
      <dc:date>2016-10-28T00:05:32Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307792#M65974</link>
      <description>&lt;P&gt;Try compressing your table first then sorting with compression on:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options compress = binary;

DATA xyz;
  set xyz;
run;

proc sort data = xyz;
etc....&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 00:26:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307792#M65974</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-10-28T00:26:35Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307797#M65976</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;Thanks for the suggestion. The dataset which I am using is compressed&lt;BR /&gt;already.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;##- Please type your reply above this line. Simple formatting, no&lt;BR /&gt;attachments. -##</description>
      <pubDate>Fri, 28 Oct 2016 00:57:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307797#M65976</guid>
      <dc:creator>div44</dc:creator>
      <dc:date>2016-10-28T00:57:22Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307798#M65977</link>
      <description>&lt;P&gt;100GB on anything is going to be a bit of a nightmare.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What are are you trying to achieve?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 01:10:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307798#M65977</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-10-28T01:10:15Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307800#M65978</link>
      <description>&lt;P&gt;Ok, then sort the dataset in chunks and interleave the chunks into one final sorted version:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data = xyz (firstobs = 1 obs = 100000)
               out = chunk1
              ;
  by a b c d;
run;

proc sort data = xyz (firstobs = 100001 obs = 200000)
               out = chunk2
              ;
  by a b c d;
run;

data final
  set chunk1
      chunk2;
  by a b c d;
run;
       
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 28 Oct 2016 05:48:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307800#M65978</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-10-28T05:48:38Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307803#M65980</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi﻿&lt;/a&gt;&amp;nbsp;That doesn't guarantee that the chunks will be in the overall correct sort order, does it?&lt;/P&gt;
&lt;P&gt;Does it find the 'smallest' records required and then put them to the output data set?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 02:22:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307803#M65980</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-10-28T02:22:10Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307818#M65984</link>
      <description>&lt;P&gt;Have you considered using the TAGSORT option?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 02:37:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307818#M65984</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-10-28T02:37:58Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307822#M65987</link>
      <description>&lt;P&gt;One way is spliting this big table into many small tables and combine them later.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data F_11 M_11 ......;
 set big;
 select;
 when(sex='F' and age=11) output F_11;
 ..........



data want;
 set F_11 M_11............&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 28 Oct 2016 03:06:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307822#M65987</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-10-28T03:06:45Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307823#M65988</link>
      <description>&lt;P&gt;If you take the approach of splitting the data set up and sorting the pieces, you can improve the speed a little if you know something about the distribution of the first BY variable. &amp;nbsp;For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data a1 a2 a3;&lt;/P&gt;
&lt;P&gt;set huge;&lt;/P&gt;
&lt;P&gt;if a &amp;lt; '4' then output a1;&lt;/P&gt;
&lt;P&gt;else if a &amp;lt; '6' then output a2;&lt;/P&gt;
&lt;P&gt;else output a3;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After sorting A1, A2, and A3 (BY A B C D), you can combine the sorted pieces more simply:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set a1 a2 a3;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A simple SET statement will be faster than anything that involves a BY statement for this final step.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 03:15:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307823#M65988</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-10-28T03:15:21Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307831#M65992</link>
      <description>&lt;P&gt;I have not used it. Does it reduce the background memory usage ?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 03:45:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307831#M65992</guid>
      <dc:creator>div44</dc:creator>
      <dc:date>2016-10-28T03:45:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307835#M65994</link>
      <description>&lt;P&gt;That's what the doc says&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="paragraph"&gt;"&lt;FONT color="#000080"&gt;The TAGSORT option is useful in single-threaded situations where there might not be enough disk space to sort a large SAS data set. The TAGSORT option is not supported for multi-threaded sorts.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&lt;FONT color="#000080"&gt;When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time might be much higher.&lt;/FONT&gt;"&lt;/DIV&gt;</description>
      <pubDate>Fri, 28 Oct 2016 04:20:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307835#M65994</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-10-28T04:20:12Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307841#M65997</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza﻿&lt;/a&gt;&amp;nbsp;I would have thought that SET plus BY will interleave the chunks to maintain the correct sorted order. Happy to be put right if that is not correct.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 05:52:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307841#M65997</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-10-28T05:52:07Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307843#M65999</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;That doesn't guarantee that the chunks will be in the overall correct sort order, does it?&lt;/P&gt;
&lt;P&gt;Does it find the 'smallest' records required and then put them to the output data set?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It has been my experience (up to now) that using equally sorted datasets in one set statement, followed by the respective by, will keep the sort order in the output.&lt;/P&gt;
&lt;P&gt;Like&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro test_sort (num);
%do i = 1 %to &amp;amp;num;
proc sort data=sashelp.class out=class&amp;amp;i;
by weight height age sex name;
run;
%end;

data want;
set
%do i = 1 %to &amp;amp;num;
  class&amp;amp;i
%end;
;
by weight height age sex name;
run;
%mend;
%test_sort(5);
proc print data=want;run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 28 Oct 2016 06:30:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307843#M65999</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-10-28T06:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307844#M66000</link>
      <description>Assuming that you already set the maximum values for MEMSIZE and SORTSIZE. &lt;BR /&gt;Disk is cheap, expanding your work drive is a no brainer with data sizes like that.</description>
      <pubDate>Fri, 28 Oct 2016 06:36:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307844#M66000</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-10-28T06:36:50Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307848#M66002</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Try compressing your table first then sorting with compression on:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options compress = binary;

DATA xyz;
  set xyz;
run;

proc sort data = xyz;
etc....&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I just ran a test here with SAS 9.2 on AIX.&lt;/P&gt;
&lt;P&gt;The compress system option does &lt;U&gt;not&lt;/U&gt; affect the utility file that is created during the sort, so sorting a compressed dataset of 100 GB might easily eat 1 TB of WORK (although the final output will once again be only 100 GB).&lt;/P&gt;
&lt;P&gt;Sorting large compressed datasets is done best with the tagsort option.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 06:54:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/307848#M66002</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-10-28T06:54:38Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308021#M66026</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser﻿&lt;/a&gt;&amp;nbsp;The macro code you provided creates 5x19 observations. Its not subsetting the data. It is just repeating 5 times.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 20:29:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308021#M66026</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2016-10-28T20:29:17Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308022#M66027</link>
      <description>&lt;P&gt;I preffer to break the dataset into smaller ones and sort. Check the code below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;%macro&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; split (dsn=, sets=);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%local&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; first &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* first observation */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;last &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* last observation */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;n &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* number of obs */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;subset &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* numbers subsets */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;perblock &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* obs per subset */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; first=1;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; subset=1;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;data _null_;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;if &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;0&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; then set &amp;amp;DSN nobs=nobs;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;call symput(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'N'&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;, put(nobs, &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;9.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;));&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;call symput(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;'perblock'&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;,put(ceil(nobs/&amp;amp;SETS), &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;9.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;));&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;stop;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%if&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; &amp;amp;N &amp;gt; &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;5&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; and &amp;amp;SETS &amp;gt; &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%then&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;&lt;FONT color="#008000" face="Courier New" size="3"&gt;/* N &amp;gt; your-choice */&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%do&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%until&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; (&amp;amp;LAST &amp;gt;= &amp;amp;N);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; last = &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%eval&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;(&amp;amp;FIRST +&amp;amp;PERBLOCK - 1);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;proc sort data=&amp;amp;DSN&lt;/P&gt;&lt;P&gt;(firstobs=&amp;amp;FIRST obs=&amp;amp;LAST)&lt;/P&gt;&lt;P&gt;out=subset&amp;amp;SUBSET;&lt;/P&gt;&lt;P&gt;by name;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; first = &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%eval&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;(&amp;amp;last + 1);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%let&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; subset = &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%eval&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;(&amp;amp;subset + 1);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%end&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%else&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%do&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;proc sort data=&amp;amp;dsn;&lt;/P&gt;&lt;P&gt;by name;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%end&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;data one;&lt;/P&gt;&lt;P&gt;set&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%do&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; i = &lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;1&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%to&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; &amp;amp;sets;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;subset&amp;amp;i&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;%end&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;by name;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;%mend&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; split;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;%&lt;STRONG&gt;&lt;I&gt;split&lt;/I&gt;&lt;/STRONG&gt; (dsn=sashelp.class, sets=&lt;/FONT&gt;&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;4&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;);&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 20:31:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308022#M66027</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2016-10-28T20:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308041#M66033</link>
      <description>&lt;P&gt;I sort datasets like this all the time on my inexpensive &amp;lt;$1000 of lease dell T7400 with 64gb and &amp;nbsp;two raid 0 SSD arrays on separate channels. For moderate size data like yours I would partition the raw input at the time you created it using SPDE(free with base SAS). The T7400's have 8 cores so you could run say 7 parallel jobs. Also split the sort utility files so are not in respective 7 work directories.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS workstation supports 16 cores.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are sorting on a large numeric key you can mod the key and run 7 parallel jobs. &amp;nbsp;I like to use a view to set the 7 pieces back together, no by statement needed. Read time should be the same as reading a single physical file? Actually it is often nice to have the pieces for future parallelism.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are other techniques if you have an index or a low cardinality uniform grouping variable(not skewed grouping variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can use firstobs and obs to read 7 partitions but you will neee a by statement fo the final view.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I switch to a server when a single table is greater than 1TB, I call this big bata.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2016 23:50:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308041#M66033</guid>
      <dc:creator>rogerjdeangelis</dc:creator>
      <dc:date>2016-10-28T23:50:08Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308044#M66035</link>
      <description>&lt;P&gt;I consider it as a sub-optimal approach if you have to partially hand code sorting only because of a lack of disk space. May be have a chat with your SAS Admin if there is something that could be done.&lt;/P&gt;
&lt;P&gt;You want a fast disk for UTILLOC but it doesn't have to be the same than WORK and it can also be pointed to multiple disks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If extending disk space is not an option then I'd go for TAGSORT as already suggested. Creation of a sorted data set will likely take much longer but you shouldn't run out of disk space anymore.&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2016 00:37:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308044#M66035</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2016-10-29T00:37:59Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting large datasets on multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308085#M66063</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/83078"&gt;@SuryaKiran&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;The macro code you provided creates 5x19 observations. Its not subsetting the data. It is just repeating 5 times.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You completely missed what I was pointing at. I just wanted to illustrate that using a set and by statement to combine separately sorted datasets will preserve the correct sort order in the output dataset.&lt;/P&gt;
&lt;P&gt;Please observe that this discussion thread deals with &lt;U&gt;sorting&lt;/U&gt;, not subsetting.&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2016 11:06:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sorting-large-datasets-on-multiple-variables/m-p/308085#M66063</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-10-29T11:06:14Z</dc:date>
    </item>
  </channel>
</rss>

