<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance Performance! in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93520#M26537</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ordering columns:&lt;/P&gt;&lt;P&gt;Not sure but I would expect this depends on the DBMS. A lot of them maintain table statistics and will anyway optimise your SQL. I don't know if it makes a difference but may be ordering a where clause/join condition along an index or having the column with the fewest distinct values first could help a SQL optimiser to get things right.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Parallelizing JOBS is normally done via scheduler (eg. LSF). If you want&amp;nbsp; to parallelize tasks within a single job then you would use a loop transformation - in your case with the inner job doing the extracts and then the outer job loading the result into target(s). Reading a source twice doesn't lock the table (only writing to as SAS table locks the table if not using SAS/Share). Certain procedures are also able to "multithread" - this is kind of parallel processing within a procedure.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my experience when working with large volume data the biggest impact on performance has normally:&lt;/P&gt;&lt;P&gt;- reduce volumes as early as possible&lt;/P&gt;&lt;P&gt;- minimise passes through data&lt;/P&gt;&lt;P&gt;- minimise data exchange between SAS and DBMS (and reduce data volumes before exchanging).&lt;/P&gt;&lt;P&gt;- minimise disk I/O and do as much as possible in memory&lt;/P&gt;&lt;P&gt;- allocate enough memory to a job - avoid paging (=more disk I/O).&lt;/P&gt;&lt;P&gt;- investigate and tweak DB settings (eg. the insertbuffer as part of a DB library definition).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 27 Jun 2012 13:47:22 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2012-06-27T13:47:22Z</dc:date>
    <item>
      <title>Performance Performance!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93519#M26536</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I work with SAS DI Studio 4.2. and I am currently reading through some high level papers on SAS Performance. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These are talking about increasing SQL Join performance by ordering columns the right way - so, what IS the right way to order columns? Let's say there are 2 tables each containing 80 columns and I need to join them by 3 columns #2, #44 and #78. Do I change #2, #44 and #78 to columns #1,#2,#3 before joining?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Another question is about "in job parallelization" - let's say there is 1 Job containing 1 source table, 2 Extracts based on the source table, 1 Append that appends the results of the 2 Extracts and a Table Loader that writes the data into another table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How would I&lt;/P&gt;&lt;P&gt;- make the 2 Extracts work in parallel via DI Studio&lt;/P&gt;&lt;P&gt;- make this work without blocking the source table&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot,&lt;/P&gt;&lt;P&gt;th&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jun 2012 12:42:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93519#M26536</guid>
      <dc:creator>thomash123</dc:creator>
      <dc:date>2012-06-27T12:42:59Z</dc:date>
    </item>
    <item>
      <title>Re: Performance Performance!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93520#M26537</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ordering columns:&lt;/P&gt;&lt;P&gt;Not sure but I would expect this depends on the DBMS. A lot of them maintain table statistics and will anyway optimise your SQL. I don't know if it makes a difference but may be ordering a where clause/join condition along an index or having the column with the fewest distinct values first could help a SQL optimiser to get things right.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Parallelizing JOBS is normally done via scheduler (eg. LSF). If you want&amp;nbsp; to parallelize tasks within a single job then you would use a loop transformation - in your case with the inner job doing the extracts and then the outer job loading the result into target(s). Reading a source twice doesn't lock the table (only writing to as SAS table locks the table if not using SAS/Share). Certain procedures are also able to "multithread" - this is kind of parallel processing within a procedure.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my experience when working with large volume data the biggest impact on performance has normally:&lt;/P&gt;&lt;P&gt;- reduce volumes as early as possible&lt;/P&gt;&lt;P&gt;- minimise passes through data&lt;/P&gt;&lt;P&gt;- minimise data exchange between SAS and DBMS (and reduce data volumes before exchanging).&lt;/P&gt;&lt;P&gt;- minimise disk I/O and do as much as possible in memory&lt;/P&gt;&lt;P&gt;- allocate enough memory to a job - avoid paging (=more disk I/O).&lt;/P&gt;&lt;P&gt;- investigate and tweak DB settings (eg. the insertbuffer as part of a DB library definition).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jun 2012 13:47:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93520#M26537</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2012-06-27T13:47:22Z</dc:date>
    </item>
    <item>
      <title>Re: Performance Performance!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93521#M26538</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks a lot for your quick reply!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I guess I have been a bit unspecific here when referring to SQL Joins. I was thinking about joins of SAS Datasets and not Joins of data in DBMS tables. Maybe there are some differences to consider?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;th&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jun 2012 13:53:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93521#M26538</guid>
      <dc:creator>thomash123</dc:creator>
      <dc:date>2012-06-27T13:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Performance Performance!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93522#M26539</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The one thing which comes to mind is that SAS can only use one index at a time - so you want to make sure that it uses the "best" one.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql FEEDBACK;... will write you in the log how SAS re-shuffels your query.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've also made the experience that it's sometimes really worth to re-formulate a join condition - especially if there are OR's in it.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Jun 2012 14:04:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Performance-Performance/m-p/93522#M26539</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2012-06-27T14:04:23Z</dc:date>
    </item>
  </channel>
</rss>

