<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Parallel processing of datastep in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49138#M10159</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Is SPDE part of base SAS or is it licenced as a separate product? How can I tell if I have a licence for it?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 17 Aug 2011 23:53:12 GMT</pubDate>
    <dc:creator>BruceBrad</dc:creator>
    <dc:date>2011-08-17T23:53:12Z</dc:date>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49131#M10152</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have a datastep with the following structure:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;- Merge two large files (with a single BY variable)&lt;/P&gt;&lt;P&gt;- Do some datastep calculations&lt;/P&gt;&lt;P&gt;- Write out another datafile&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is running on SAS 9.2 on a Win2008 x64 server. When it runs, one of the 32 cores on the server is at 100% CPU, with the others unused - so presume it could benefit from parallelisation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there an easy way to do this in SAS?&amp;nbsp; I've seen some mention of proc sql exploiting multiple threads internally, so could I use this?&amp;nbsp; Or has anyone written macro code to do the splitting and joining required?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 16 Aug 2011 12:48:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49131#M10152</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2011-08-16T12:48:47Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49132#M10153</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Bruce,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've never tried it, but you might find the concepts included in the following paper helpful in your attempt:&lt;/P&gt;&lt;P&gt;&lt;A href="http://support.sas.com/resources/papers/proceedings10/109-2010.pdf"&gt;http://support.sas.com/resources/papers/proceedings10/109-2010.pdf&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 16 Aug 2011 13:29:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49132#M10153</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2011-08-16T13:29:48Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49133#M10154</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; You can parallelise tasks by splitting data logically and process it in separate SAS server sessions, then combine the results afterwards. Check the SAS documentation for asynchronous processing using SAS/CONNECT or other posts on this topic. Also consider other performance enhancing techniques like data compression, indexing, tuning i/o buffers etc.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 16 Aug 2011 22:46:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49133#M10154</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2011-08-16T22:46:48Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49134#M10155</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This paper has come sample macro code that could prove helpful:&lt;/P&gt;&lt;P&gt;&lt;A href="http://www2.sas.com/proceedings/forum2007/036-2007.pdf"&gt;http://www2.sas.com/proceedings/forum2007/036-2007.pdf&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Aug 2011 01:50:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49134#M10155</guid>
      <dc:creator>SASJedi</dc:creator>
      <dc:date>2011-08-17T01:50:36Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49135#M10156</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks for the suggestions. I hadn't seen the 2010 paper - it is looking close to what I want. Shame this feature isn't inbuilt into the SAS datastep code. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The 2010 paper suggests putting the 'normal' datastep code into a separate file so it can be read by multiple SAS processes. It would be neater if this code were in a macro in the main program (like the the SAS bootstrap macros) - that way I could also use global macro variables and macros within the data step. To do this, I guess I would need to pass the global macro environment of the main process (eg defined macros and global macro variables) to the spawned processes. Is this possible in SAS?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Aug 2011 06:11:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49135#M10156</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2011-08-17T06:11:24Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49136#M10157</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The SPDE library engine works very well for this.&amp;nbsp; As long as the I/O subsystem on the machine is sufficient enough to support it.&amp;nbsp; The SPDE engine also provides a number of other benefits, like being able to split data sets over multiple I/O channels and disks, I utilize the engine for extremely large data.&amp;nbsp; Tables in the trillions of rows for example.&amp;nbsp; Here are some metrics from a job I ran just the other day:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I extract a table from Oracle Data Warehouse.&amp;nbsp; The data is column dimensioned.&amp;nbsp; 12.3T rows and 4 columns.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Extract in 8 threads to single dataset using dbslice... 2 hours&lt;/P&gt;&lt;P&gt;Proc sort entire set by primary and secondary key... 5 hours&lt;/P&gt;&lt;P&gt;Datastep to transpose data using by group processing and arrays... 9 hours&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The resulting dataset in a row dimensioned SAS table with 200M rows and 5500 columns.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Without utilizing the SPDE library engine these processes took so long that they were basically deemed useless...&amp;nbsp; Most significantly the final transpose step took more that 3 full days to run.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The machine used to create the above numbers is a 64bit Linux, 8 cores, 42 GB ram, storage is a 24 disk DAS with two 12 disk raid 5 groups that are then in a raid 0 configuration.&amp;nbsp; SPDE engine is using 8 data store points and 24 max threads.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In short, I LOVE the SPDE library engine!&amp;nbsp; it is an incredibly useful tool for big data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Other methods for data step multithreading is to split the data into logical groups as others said and utilizing the MP-CONNECT tools (available if you have SAS/CONNECT license) then run multiple asyn steps.&amp;nbsp; I personally prefer the simplicity the use of the SPDE engine allows for comparatively.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Aug 2011 21:36:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49136#M10157</guid>
      <dc:creator>FriedEgg</dc:creator>
      <dc:date>2011-08-17T21:36:41Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49137#M10158</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Yes. Check out the %SYSRPUT and %SYSLPUT macro statements. Be careful with the use of macros between main and spawned processes as they have to be submitted in all processes to be available. I still think %include is a cleaner proposition, but that may depend on the the complexity of your processing. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Aug 2011 21:45:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49137#M10158</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2011-08-17T21:45:25Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49138#M10159</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Is SPDE part of base SAS or is it licenced as a separate product? How can I tell if I have a licence for it?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Aug 2011 23:53:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49138#M10159</guid>
      <dc:creator>BruceBrad</dc:creator>
      <dc:date>2011-08-17T23:53:12Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49139#M10160</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would think you could tell by trying to run something like:&lt;/P&gt;&lt;P&gt;proc spdo;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you don't have it I would guess the log would inform you of that fact.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Aug 2011 00:28:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49139#M10160</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2011-08-18T00:28:54Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49140#M10161</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The SPDE library engine is a included feature as part of you base license, which is quite nice. It is somewhat of an introductory product to the SPDS product. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Aug 2011 01:21:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49140#M10161</guid>
      <dc:creator>FriedEgg</dc:creator>
      <dc:date>2011-08-18T01:21:14Z</dc:date>
    </item>
    <item>
      <title>Parallel processing of datastep</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49141#M10162</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Bruce&lt;/P&gt;&lt;P&gt;&amp;nbsp; &lt;/P&gt;&lt;P&gt;as you say&amp;nbsp; "I guess I would need to pass the global macro environment of the main process (eg defined macros and global macro variables) to the spawned processes"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;It is just a matter of passing one parameter to each subtask, to indentify what segment that subtask needs to do + &lt;/P&gt;&lt;P&gt;plus making the macro pool addressible by all subtasks - or having duplicates of the macro pool privately available to each separate subtask.&lt;/P&gt;&lt;P&gt;The single parameter serves multiple purposes:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; the subtask "task-id", &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; the subsetting value for a where clause, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; and/or, a subscript on the name of the input and output data sets.&lt;/P&gt;&lt;P&gt;All other parameters like "global macro variables" will be common to all, so can be a part of that "macro pool" &lt;/P&gt;&lt;P&gt;Keeping the parameter list "singular" seems best. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;peterC &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Aug 2011 09:38:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Parallel-processing-of-datastep/m-p/49141#M10162</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2011-08-18T09:38:16Z</dc:date>
    </item>
  </channel>
</rss>

