<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimize SQL Query in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584196#M166326</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/115701"&gt;@Harsh_Vira&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.&lt;BR /&gt;&lt;BR /&gt;On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given the number of rows in the file, I think 6 minutes may not be bad.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best wishes,&lt;/P&gt;
&lt;P&gt;Jeff&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Aug 2019 12:22:12 GMT</pubDate>
    <dc:creator>JBailey</dc:creator>
    <dc:date>2019-08-27T12:22:12Z</dc:date>
    <item>
      <title>Optimize SQL Query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584161#M166315</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on&amp;nbsp;&lt;STRONG&gt;SAS version: 9.04.01M4P110916&lt;/STRONG&gt; which is hosted on&lt;STRONG&gt; RHEL&amp;nbsp;64bit.&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to optimize a Proc SQL query.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The source data is in Hadoop and client has only SAS ACCESS FOR HADOOP connector.&lt;BR /&gt;The records in source table are 101730000.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the SAS query:&lt;/P&gt;&lt;P&gt;PROC SQL;&lt;BR /&gt;RESET inobs=max outobs=max noflow nofeedback noprompt nonumber;&lt;BR /&gt;CREATE TABLE work.Test1 AS&lt;/P&gt;&lt;P&gt;select distinct&amp;nbsp; From libname.table&lt;BR /&gt;Where column8 in (&amp;nbsp;value1,value2,value3,value4,value5,value6,value7,value8,value9,value10,value11,value12,value13,value14,value15,&lt;BR /&gt;value16,value17);&lt;BR /&gt;QUIT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The current query execution time: 06mins&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please suggest is there anyway i can improve the efficiency (execution time) of above step.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 10:30:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584161#M166315</guid>
      <dc:creator>Harsh_Vira</dc:creator>
      <dc:date>2019-08-27T10:30:19Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize SQL Query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584190#M166323</link>
      <description>&lt;P&gt;Your query isn't your actual one I guess, so for instance, how does your IN clauase actally look like?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Second - try your query direct in Hive.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your query goes faster there, check how SAS transforms your step into HiveQL, by adding:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you experience the same performance, turn to a Hive DBA to help you organize your table/rewrite your query.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Generally speaking, 6 minutes is not a long time if you are in a data science environment. If you need faster response time, consider other storage/execution options, like Impala.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 11:56:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584190#M166323</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2019-08-27T11:56:21Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize SQL Query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584196#M166326</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/115701"&gt;@Harsh_Vira&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;provides great advice regarding the OPTIONS statement. It is very important to know the query that SAS is sending to Hive.&lt;BR /&gt;&lt;BR /&gt;On the Hadoop side: it is important to understand how the data is arranged. Partitioning the data may help if Hive/MapReduce can use partition elimination. The Hadoop admins can help with this. It is also a good idea to try the query in HUE and see how it performs without bringing the result set into SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given the number of rows in the file, I think 6 minutes may not be bad.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best wishes,&lt;/P&gt;
&lt;P&gt;Jeff&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2019 12:22:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimize-SQL-Query/m-p/584196#M166326</guid>
      <dc:creator>JBailey</dc:creator>
      <dc:date>2019-08-27T12:22:12Z</dc:date>
    </item>
  </channel>
</rss>

