<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Automate SAS code to wait for sorted datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560011#M156470</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The reason for splitting up for source tables is&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.the source table has 800+ million of records and this table has to join with some other tables and finally create target dataset.&lt;/P&gt;&lt;P&gt;2.initially we tried like doing joining with proc sql and getting the final output.but this takes a long time and it will not be completely processing this ,i will get errors like "insuffcient space".so we decided to split the tables and join or merge in order to complete this whole process with very less time please.&lt;/P&gt;&lt;P&gt;3.main aim to split is to do this whole process with very less time and avoid to get space issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;U&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562" target="_blank" rel="noopener"&gt;Iam really trying to understand what&amp;nbsp; @KurtBremser&lt;/A&gt;&lt;/U&gt;&amp;nbsp;mentioned in this thread.as i am unable to grasp fully please and i am not sure how can i proceed on this please.sorry to trouble even after so much explanation i am asking like this please.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would it be possible to help how hash code mentioned below works please and whether it will split please&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 20 May 2019 06:44:08 GMT</pubDate>
    <dc:creator>JJP1</dc:creator>
    <dc:date>2019-05-20T06:44:08Z</dc:date>
    <item>
      <title>Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559554#M156262</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;&lt;P&gt;Would you please help on how can we automate code for below steps please&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. get number of records from SAS dataset&lt;/P&gt;&lt;P&gt;2.split the dataset into 8 jobs based on the number of records(approximately 1 million) and sort the splitted datasets based on the key and run these 8 jobs in parallel&lt;/P&gt;&lt;P&gt;3.also i need to create step where it should wait and check whether these splited datasets are sorted successfully,so that we can proceed with next step&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 03:56:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559554#M156262</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-17T03:56:12Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559555#M156263</link>
      <description>&lt;P&gt;Unless the 8 jobs can do their sort in memory or use their own disks, this could be be much slower than a single sort (or even successive sorts) as the jobs will be competing for the same disk resources.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 04:35:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559555#M156263</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-17T04:35:24Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559559#M156265</link>
      <description>&lt;P&gt;Thanks.Would you please help on coding part where we can split the dataset(that is having 800 millions +records)&lt;/P&gt;&lt;P&gt;into 1 million records of data set(8 subsets of datasets) and sort simultaneously so that we can reduced the time taking.&lt;/P&gt;&lt;P&gt;please how can we approach this please.actually i was using split 1 macro which i found in SAS papers and splitting the datasets into number of subsets.&lt;/P&gt;&lt;P&gt;but we should sort simultaneously as and when the splitted data set gets created. this part i stuck please help. also i need to create wait step for these to wait till these data sets completed please&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 05:43:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559559#M156265</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-17T05:43:39Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559561#M156266</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/256123"&gt;@JJP1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Why don't you just go for a multi threaded sort. That will likely perform better than anything you can manually code for.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://go.documentation.sas.com/?docsetId=proc&amp;amp;docsetTarget=p0guut2xk8yz2yn17ibn9nwcyx8v.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p0lxobanuchmjin1daufy56f496w"&gt;https://go.documentation.sas.com/?docsetId=proc&amp;amp;docsetTarget=p0guut2xk8yz2yn17ibn9nwcyx8v.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#p0lxobanuchmjin1daufy56f496w&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If that's still not good enough then investigate how UTILLOC is assigned and eventually try to spread this out over multiple disks (for better disk I/O).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And last but not least: Consider creating the source and sorted target table using the SPDE engine and also make sure that the data chunks and SPDEUTILLOC gets spread out over multiple disks.&lt;BR /&gt;&lt;A href="https://support.sas.com/documentation/cdl/en/engspde/69752/HTML/default/viewer.htm#p14iytucvmskdtn1qeye9zkpp0ss.htm" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/engspde/69752/HTML/default/viewer.htm#p14iytucvmskdtn1qeye9zkpp0ss.htm&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 06:11:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559561#M156266</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-05-17T06:11:27Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559582#M156276</link>
      <description>&lt;P&gt;You need to fit your computing and storage power to the size of your data. SAS sorting tools (proc sort, sorting in SQL, implicit sorting in SPDE) are already multi-threaded, and can use proper infrastructure if it's there. If the infrastructure is not there, you can't magically create better performance. Your split/sort/merge will most probably be worse than what you have now.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Separate WORK and UTILLOC. Use striped, high-quality SSD's for these volumes. Have enough cores.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to split, do not run the sorts in parallel, run them one after another, to avoid contention on the disks (especially if you still use spinning metal). Write that first without macro code, see if it helps at all, and once that is confirmed, make it dynamic.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 08:27:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559582#M156276</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-17T08:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559600#M156286</link>
      <description>&lt;P&gt;ok. thank you.would you please help on i need to create a wait step to wait till sort operation completes&lt;/P&gt;&lt;P&gt;would you please help on this&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 09:37:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559600#M156286</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-17T09:37:21Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559605#M156290</link>
      <description>&lt;P&gt;Read the last paragraph of my post again.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 09:50:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559605#M156290</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-17T09:50:56Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559608#M156293</link>
      <description>&lt;P&gt;yes,i have decided to run manually with out any macros',so now i can not run currently as we are facing with some slowness issue in our&amp;nbsp; SAS server&lt;/P&gt;&lt;P&gt;so i am just preparing to do this whenever environment is back and stable&lt;/P&gt;&lt;P&gt;so i am aware of splitted and sorting process (i mean writing the code).&lt;/P&gt;&lt;P&gt;bit i don't know how to write code to wait till sorted datasets gets completed please.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also sorry to ask again,i am not clear on below statement ,please help&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"&lt;SPAN&gt;Separate WORK and UTILLOC. Use striped, high-quality SSD's for these volumes. Have enough cores."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;how can we&amp;nbsp; follow above approach&amp;nbsp;please.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1.kindly confirm is this the above approach&amp;nbsp;you are suggesting will be better&amp;nbsp; than whatever i posted in this thread(split ,sort and merge parallely)&amp;nbsp;please.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 10:11:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559608#M156293</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-17T10:11:17Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559617#M156295</link>
      <description>&lt;P&gt;When you run the sorts in succession (eg in a macro %do loop), you can interleave/concatenate the resulting datasets as soon as the loop is finished, no need for a special wait.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having infrastructure that lets you use simple code without any artificial optimizations is always to be preferred. Especially if the problem occurs repeatedly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In my ~1000 SAS batch jobs, I have exactly one where I split a large dataset into subsets, and that was not because of performance, but because the required sort would use so much of UTILLOC that other, concurrent jobs would not have enough left. Mind that the overall performance is considerably worse than if I ran it the usual way, but then I would only be able to test it through the main batch job production user which has no quotas defined. Something you do not want, period.&lt;/P&gt;
&lt;P&gt;So I do this&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;dynamically split along the contents of the first original "by" variable&lt;/LI&gt;
&lt;LI&gt;sort individually, and run proc means against each subset, using only the second and following "by"variables&lt;/LI&gt;
&lt;LI&gt;stack the result subsets in sorted order, thereby emulating a sort of the top level "by" variable&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;IIRC, I do not even use macro code. I create a control dataset with the distinct values of the top-level by variable, and use call execute repeatedly to work through the data one value at a time.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 10:44:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559617#M156295</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-17T10:44:45Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559620#M156297</link>
      <description>Kurt, For that massive sort, have you tried option tagset instead? Not saying it will be better, wondering rather.&lt;BR /&gt;</description>
      <pubDate>Fri, 17 May 2019 10:50:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559620#M156297</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-17T10:50:45Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559624#M156300</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Kurt, For that massive sort, have you tried option tagset instead? Not saying it will be better, wondering rather.&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I often use tagset for datasets that produce over-large utility files (because of the compress option), but in this case, the "big chunk" was so large that I already had to "dance" across multiple workspaces to even create it. And when a new hierarchy level with considerable cardinality was added that prevented me from using class in proc means, the sort would either be too resource- (usual sort) or time- (tagset) consuming, so I opted for the split, where I could delete the master dataset after splitting, and the individual steps ran in a decent timeframe. Mind that I don't create the big dataset any longer, I already create splitted datasets in the first place now.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this happened to me more than once, I would surely have decided to increase my UTILLOC and workspaces.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As the resulting code is far from "elegant" and therefore violates several of my Maxims, it is a clear exception to the rules. And it was one of the things I went through in detail when I started introducing my successor on the job, so he won't have a bad surprise the first time he has to do maintenance on it.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 11:12:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559624#M156300</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-17T11:12:11Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559626#M156301</link>
      <description>Thanks for the insight. And I presume you split with a where clause on the sort key rather than by obs, so the sorted data sets are ready to use? &lt;BR /&gt;</description>
      <pubDate>Fri, 17 May 2019 11:19:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559626#M156301</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-17T11:19:45Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559687#M156328</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks for the insight. And I presume you split with a where clause on the sort key rather than by obs, so the sorted data sets are ready to use? &lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Yes. That way the final step after the summaries is a very simple data step that concatenates everything, without even a by for interleaving needed.&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 15:18:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559687#M156328</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-17T15:18:23Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559823#M156390</link>
      <description>Yes. SET BY really kills speed compared to SET. Proc append rather than a data step might cut a bit of time too. Thanks again for sharing. &lt;BR /&gt;</description>
      <pubDate>Fri, 17 May 2019 23:20:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559823#M156390</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-17T23:20:45Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559833#M156394</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/256123"&gt;@JJP1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you read what&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;writes in this track then you'll understand that the manual approach you want to take is only suitable for rare exceptional cases. What's your reason for splitting up the source table?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If space restrictions in work/utilloc are the issue and the sort keys fit into memory then one coding alternative is the use of a hash table.&lt;/P&gt;
&lt;P&gt;I get in my environment comparable run times with below sample data for all 3 sort steps.&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options fullstimer;
data have(compress=yes);
  array bigvars{30} $100. (30*'A bbb cccc');
  do key1=8000 to 1 by -1;
    key2=rand('integer', 100);
    do key3=1000 to 1 by -1;
      output;
    end;
  end;
  stop;
run;

data want_hashsort(sortedby=key1 key2 key3 compress=yes);
  if _n_=1 then
    do;
      length _rownum 8;
      dcl hash h1(multidata:'y', ordered:'y');
      dcl hiter hh1('h1');
      h1.defineKey('key1','key2','key3');
      h1.defineData('_rownum');
      h1.defineDone();

      do while(not _last);
        set have(keep=key1 key2 key3) end=_last;
        _rownum+1;
        _rc =h1.add();
      end;
    end;

  _rc = hh1.first();
  do while (_rc = 0);
    set have point=_rownum;
    output;
    _rc = hh1.next();
  end;
  stop;
run;

proc datasets lib=work nolist nowarn;
  delete want_:;
  run;
quit;

proc sort data=have out=want_procsort1(compress=yes);
  by key1 key2 key3;
run;

proc datasets lib=work nolist nowarn;
  delete want_:;
  run;
quit;

proc sort data=have out=want_procsort2(compress=yes) tagsort;
  by key1 key2 key3;
run;

proc datasets lib=work nolist nowarn;
  delete have want_:;
  run;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;Real Times&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Hash:&amp;nbsp; 1:04&lt;/P&gt;
&lt;P&gt;Sort1:&amp;nbsp; 1:16&lt;/P&gt;
&lt;P&gt;Sort2:&amp;nbsp; 0:53&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From what I understand TAGSORT does actually something pretty similar to what above HASH approach uses - so eventually TAGSORT is the way to go if work space is the issue even though the sort is no more threaded.&lt;/P&gt;</description>
      <pubDate>Sat, 18 May 2019 01:26:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/559833#M156394</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-05-18T01:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560011#M156470</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;The reason for splitting up for source tables is&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.the source table has 800+ million of records and this table has to join with some other tables and finally create target dataset.&lt;/P&gt;&lt;P&gt;2.initially we tried like doing joining with proc sql and getting the final output.but this takes a long time and it will not be completely processing this ,i will get errors like "insuffcient space".so we decided to split the tables and join or merge in order to complete this whole process with very less time please.&lt;/P&gt;&lt;P&gt;3.main aim to split is to do this whole process with very less time and avoid to get space issues.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;U&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562" target="_blank" rel="noopener"&gt;Iam really trying to understand what&amp;nbsp; @KurtBremser&lt;/A&gt;&lt;/U&gt;&amp;nbsp;mentioned in this thread.as i am unable to grasp fully please and i am not sure how can i proceed on this please.sorry to trouble even after so much explanation i am asking like this please.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would it be possible to help how hash code mentioned below works please and whether it will split please&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 06:44:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560011#M156470</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-20T06:44:08Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560013#M156472</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/256123"&gt;@JJP1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for explaining the reason for splitting. That helps. And no reason for apologies. I'd say you get that much "resistance" from "us" because we don't want you to go down the wrong rabbit hole.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Nothing will be "very less time" with that big tables but there are potentially ways to combine data from different sources without the need to sort everything physically on disk. What's possible and "best" depends on what you have and what you need.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I believe it would be worth to look into the bigger problem to not solve something that eventually doesn't require solving (like coding for a "split-sort"). For us to give some guidance for the bigger challenge you would need to provide more info like:&lt;/P&gt;
&lt;P&gt;1. Name of source tables (assumed all stored as SAS tables, else please specify if any database involved)&lt;/P&gt;
&lt;P&gt;2. Number of rows per source table and eventually also the size per table&lt;/P&gt;
&lt;P&gt;3. The code of the actual join which you've got working already for lower volumes (so we can understand the join logic)&lt;/P&gt;
&lt;P&gt;4. The length of the variables used for joining (so we can estimate memory consumption for any in-memory approach)&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. Is there any need for the resulting target table to be sorted in a specific way for further downstream processing.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 08:03:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560013#M156472</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-05-20T08:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560014#M156473</link>
      <description>&lt;P&gt;Whenever you run into "insufficient disk space" while doing a join in SQL (or experience unexpectedly bad performance), you positively need to test the basic SAS method of sorting and doing a data step merge before you hare off into really complicated coding, provided you do not need the SQL capability of creating a cartesian product. If a simple proc sort of a dataset is not possible (even when using the tagset option), then you need to either work on your infrastructure or go into complex coding.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But whatever you do, sorting in sequence is to be preferred over sorting in parallel.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 06:59:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560014#M156473</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-20T06:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560030#M156487</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt; for kind reponse .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;please find the details below&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Name of source tables (assumed all stored as SAS tables, else please specify if any database involved)&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;All stored as SAS tables&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;2. Number of rows per source table and eventually also the size per table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Size&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;Library of Size of&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;Name Member Name File File&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;-------------------------------------------------------------&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX&amp;nbsp; &amp;nbsp; &amp;nbsp;AAA 2GB 1931520K&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX&amp;nbsp; &amp;nbsp; BBB 102GB 1.0673E8K&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX&amp;nbsp; &amp;nbsp; CCC 1GB 1445376K&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX&amp;nbsp; &amp;nbsp;DDD 3GB 2657024K&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX EEE 1GB 1227008K&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;XXX FFF 2GB 2466048K&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;Table_Name Record count &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;AAA - 22758772&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;BBB - 8.1021E8&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;CCC - 28754734&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;DDD - 27547919&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;EEE - 24549490&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;FFF - 24499890&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;3. The code of the actual join which you've got working already for lower volumes (so we can understand the join logic)&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;tried joining small tables and placed in one permanent dataset and sorted large dataset and placed in one&amp;nbsp;permanent&amp;nbsp; dataset. and tried merging.but it takes more time and insufficient space issue&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;code for actual join is&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
   CREATE TABLE joined1 AS 
   SELECT t1.AAAAAAAAAAA, 
          t1.PPPPPPPPPPPP, 
          t1.CCCCCCCCCCCC, 
          t1.DDDDDD, 
          t1.TTTTTTTTTTT, 
          t1.SSSSSSSSS, 
          t1.GGGGGGGGGGGGG, 
          t2.KKKKKKKKKKKK, 
          t2.FFFFFFFFFFFF, 
          t2.YYYYYYYY, 
          t2.RRRRRRR, 
          t2.EEEEEEEEEEEE, 
          t2.UUUUUUUUUUUUU, 
          t2.ZZZZZZZZZZZZZZZZ, 
          t2.LLLLLLLLLLLL, 
          t3.JJJJJJJJJ, 
          t3.NNNNNNNNNNNNNN, 
          t4.OOOOOOOOOOOO, 
          t4.VVVVVVVVVVVV, 
        
      FROM AAA t1
           left JOIN FFF t2 ON (t1.AAAAAAAAAAA = t2.AAAAAAAAAAA)
           left JOIN EEE t3 ON (t2.KKKKKKKKKKKK = t3.KKKKKKKKKKKK)
           left JOIN CCC t4 ON (t3.JJJJJJJJJ = 
          t4.JJJJJJJJJ);
QUIT;

PROC SQL;
   CREATE TABLE joined2 AS 
   SELECT t1.*, 
          t2.YYYYYYYYYYYYYYY, 
          t2.SAMPLE, 
          t2.SAMPLE1, 
          t2.SAMPLE2, 
          t2.SAMPLE3, 
          t2.SAMPLE4, 
          t2.SAMPLE5, 
          t2.SAMPLE6, 
          t2.SAMPLE7, 
          t2.SAMPLE8, 
          t2.SAMPLE9, 
          t2.SAMPLE10
      FROM joined1 t1
           left JOIN DDD t2 ON (t1.OOOOOOOOOOOO = t2.OOOOOOOOOOOO);
QUIT;


proc sort data=xx.BBB out=BBB;(large dataset)
  by YYYYYYYYYYYYYYY QQQQQQQQQQ;
run;
proc sort data=joined2;
  by YYYYYYYYYYYYYYY;
run;
data joined3;
  merge joined2(in=ina) BBB(in=inb);
  by YYYYYYYYYYYYYYY;
  if ina;
run;


&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4. The length of the variables used for joining (so we can estimate memory consumption for any in-memory approach)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;the length of the columns i m using to join the columns are&amp;nbsp; : 16,23,32,28,23&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;please excuse me for mentioning in code as table names and columns names as sudo values please&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;5. Is there any need for the resulting target table to be sorted in a specific way for further downstream processing&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;Yes The resultant target&amp;nbsp;table further needs to be subset-ted&amp;nbsp;to 3 different dataset based on different conditions please&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 08:07:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560030#M156487</guid>
      <dc:creator>JJP1</dc:creator>
      <dc:date>2019-05-20T08:07:36Z</dc:date>
    </item>
    <item>
      <title>Re: Automate SAS code to wait for sorted datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560035#M156490</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/256123"&gt;@JJP1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
   CREATE TABLE joined1 AS 
   SELECT t1.AAAAAAAAAAA, 
          t1.PPPPPPPPPPPP, 
          t1.CCCCCCCCCCCC, 
          t1.DDDDDD, 
          t1.TTTTTTTTTTT, 
          t1.SSSSSSSSS, 
          t1.GGGGGGGGGGGGG, 
          t2.KKKKKKKKKKKK, 
          t2.FFFFFFFFFFFF, 
          t2.YYYYYYYY, 
          t2.RRRRRRR, 
          t2.EEEEEEEEEEEE, 
          t2.UUUUUUUUUUUUU, 
          t2.ZZZZZZZZZZZZZZZZ, 
          t2.LLLLLLLLLLLL, 
          t3.JJJJJJJJJ, 
          t3.NNNNNNNNNNNNNN, 
          t4.OOOOOOOOOOOO, 
          t4.VVVVVVVVVVVV, 
        
      FROM AAA t1
           left JOIN FFF t2 ON (t1.AAAAAAAAAAA = t2.AAAAAAAAAAA)
           left JOIN EEE t3 ON (t2.KKKKKKKKKKKK = t3.KKKKKKKKKKKK)
           left JOIN CCC t4 ON (t3.JJJJJJJJJ = 
          t4.JJJJJJJJJ);
QUIT;

PROC SQL;
   CREATE TABLE joined2 AS 
   SELECT t1.*, 
          t2.YYYYYYYYYYYYYYY, 
          t2.SAMPLE, 
          t2.SAMPLE1, 
          t2.SAMPLE2, 
          t2.SAMPLE3, 
          t2.SAMPLE4, 
          t2.SAMPLE5, 
          t2.SAMPLE6, 
          t2.SAMPLE7, 
          t2.SAMPLE8, 
          t2.SAMPLE9, 
          t2.SAMPLE10
      FROM joined1 t1
           left JOIN DDD t2 ON (t1.OOOOOOOOOOOO = t2.OOOOOOOOOOOO);
QUIT;


proc sort data=xx.BBB out=BBB;(large dataset)
  by YYYYYYYYYYYYYYY QQQQQQQQQQ;
run;
proc sort data=joined2;
  by YYYYYYYYYYYYYYY;
run;
data joined3;
  merge joined2(in=ina) BBB(in=inb);
  by YYYYYYYYYYYYYYY;
  if ina;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;And where in this code does the "insufficient disk space" happen? While sorting the large table, or during the merge?&lt;/P&gt;
&lt;P&gt;Please post the whole log of that failing step.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 08:18:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Automate-SAS-code-to-wait-for-sorted-datasets/m-p/560035#M156490</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-05-20T08:18:41Z</dc:date>
    </item>
  </channel>
</rss>

