<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: need to improve performance of query which use connection to hive in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836587#M330780</link>
    <description>&lt;P&gt;How many rows are extracted into your SAS table? If it is a lot rerun your query changing&amp;nbsp;&lt;SPAN&gt;&amp;amp;selstr for count(*). This will measure the performance of the query itself only and not the time to transfer the data across your network back to SAS. Please post the time for this query.&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Oct 2022 19:25:15 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2022-10-03T19:25:15Z</dc:date>
    <item>
      <title>need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836557#M330765</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using SAS EG 7.1 and writing some query such as parsethrough which use using to connect to the hive and accessing data from Hadoop but its taking to much time to execute simple queries and i need to improve the performance of query , so could you please help me to improve performance .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%let selstr =&lt;BR /&gt;int(test_acc/100) as pac,&lt;BR /&gt;test_acc-100 * int(test_acc/100) as test_sub_acc,&lt;BR /&gt;date-&amp;amp;dt_1. as comp_dt,&lt;BR /&gt;lstcapdte-&amp;amp;dt_2. as lstcap_dt,&lt;BR /&gt;test_int_rate/100 as int_rate,&lt;BR /&gt;Test_cur_bal as Bal,&lt;BR /&gt;edh_ingest_ts;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;connect to hadoop (&lt;BR /&gt;&amp;nbsp;server="XXXXXXXXXXXX"&lt;BR /&gt;&amp;nbsp;uri="jdbc:hive2://XXXXXX,XXXX,XXXX/;&lt;BR /&gt;&amp;nbsp;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=XXXserver2" HIVE_KERBEROS_PRINCIPAL="hive/XXXXXX"&lt;BR /&gt;&amp;nbsp;port=xxxxx&lt;BR /&gt;&amp;nbsp;schema=xxxxx&lt;BR /&gt;&amp;nbsp;read_method=JDBC&lt;BR /&gt;&amp;nbsp;properties="mapred.job.queue.name=XXXX_default;hive.fetch.task.conversion=minimal;hive.cbo.enable=true;&lt;BR /&gt;&amp;nbsp;hive.compute.query.using.stats=true;hive.stats.fetch.column.stats=true;hive.stats.fetch.partition.stats=true;&lt;BR /&gt;&amp;nbsp;hive.prewarm.enabled=true;"&lt;BR /&gt;&amp;nbsp;dbmax_text=255&lt;BR /&gt;&amp;nbsp;);&lt;BR /&gt;&amp;nbsp;create table work.test as&lt;BR /&gt;&amp;nbsp;select&lt;BR /&gt;&amp;nbsp;*&lt;BR /&gt;&amp;nbsp;from connection to hadoop (&lt;BR /&gt;&amp;nbsp;select&lt;BR /&gt;&amp;nbsp;&amp;amp;selstr.&lt;BR /&gt;&amp;nbsp;from &amp;amp;table.&lt;BR /&gt;&amp;nbsp;where to_date(process_date_time) &amp;gt;=&amp;amp;test_date_1. and to_date(process_date_time) &amp;lt;=&amp;amp;test_date_2.&lt;BR /&gt;&amp;nbsp;);&lt;BR /&gt;disconnect from hadoop;&lt;/P&gt;&lt;P&gt;quit;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;/*processing time as below*/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;real time 35:37.20&lt;BR /&gt;2 The SAS System 09:28 Thursday, September 29, 2022&lt;/P&gt;&lt;P&gt;user cpu time 1:19.61&lt;BR /&gt;system cpu time 26.64 seconds&lt;BR /&gt;memory 5633.62k&lt;BR /&gt;OS Memory 43176.00k&lt;BR /&gt;Timestamp 29/09/2022 03:46:38 PM&lt;BR /&gt;Step Count 8 Switch Count 3&lt;BR /&gt;Page Faults 68&lt;BR /&gt;Page Reclaims 440&lt;BR /&gt;Page Swaps 0&lt;BR /&gt;Voluntary Context Switches 92058&lt;BR /&gt;Involuntary Context Switches 26290&lt;BR /&gt;Block Input Operations 0&lt;BR /&gt;Block Output Operations 0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 16:22:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836557#M330765</guid>
      <dc:creator>hema_bhure</dc:creator>
      <dc:date>2022-10-03T16:22:25Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836587#M330780</link>
      <description>&lt;P&gt;How many rows are extracted into your SAS table? If it is a lot rerun your query changing&amp;nbsp;&lt;SPAN&gt;&amp;amp;selstr for count(*). This will measure the performance of the query itself only and not the time to transfer the data across your network back to SAS. Please post the time for this query.&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 19:25:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836587#M330780</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-10-03T19:25:15Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836653#M330809</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;no there is not changing the variable into the &amp;amp;selstr. The selstr is limited for this query only.&lt;/P&gt;&lt;P&gt;I am just copying the the processing time and number of observation below.&lt;/P&gt;&lt;P&gt;please let me know how i can optimised it, i tried libname approach also but the still query&amp;nbsp; takes lot of time to extract data.&lt;/P&gt;&lt;P&gt;----------------------------------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.test decreased size by 0.71 percent.&lt;BR /&gt;Compressed is 58707 pages; un-compressed would require 59127 pages.&lt;BR /&gt;NOTE: Table WORK.test created, with 43930975 rows and 13 columns.&lt;/P&gt;&lt;P&gt;45 ;&lt;BR /&gt;46 disconnect from hadoop;&lt;BR /&gt;47 quit;&lt;BR /&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;real time 35:37.20&lt;BR /&gt;2 The SAS System 09:28 Thursday, September 29, 2022&lt;/P&gt;&lt;P&gt;user cpu time 1:19.61&lt;BR /&gt;system cpu time 26.64 seconds&lt;BR /&gt;memory 5633.62k&lt;BR /&gt;OS Memory 43176.00k&lt;BR /&gt;Timestamp 29/09/2022 03:46:38 PM&lt;BR /&gt;Step Count 8 Switch Count 3&lt;BR /&gt;Page Faults 68&lt;BR /&gt;Page Reclaims 440&lt;BR /&gt;Page Swaps 0&lt;BR /&gt;Voluntary Context Switches 92058&lt;BR /&gt;Involuntary Context Switches 26290&lt;BR /&gt;Block Input Operations 0&lt;BR /&gt;Block Output Operations 0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 08:10:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836653#M330809</guid>
      <dc:creator>hema_bhure</dc:creator>
      <dc:date>2022-10-04T08:10:11Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836671#M330816</link>
      <description>&lt;P&gt;It's likely that the data transfer is what consumes the time.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;suggested run your query with a select statement that only returns a single row of data as this will tell us if it's your where clause or the data transfer that consumes the time.&lt;/P&gt;
&lt;P&gt;Can you please re-run the code with the following change&lt;/P&gt;
&lt;PRE&gt;%let selstr = count(*) as n_rows;&lt;/PRE&gt;</description>
      <pubDate>Tue, 04 Oct 2022 09:41:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836671#M330816</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-10-04T09:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836681#M330818</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;following is n_row count and log&amp;nbsp;&lt;/P&gt;&lt;P&gt;n_rows&lt;BR /&gt;40073670&lt;/P&gt;&lt;P&gt;------------------------------------------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;P&gt;-------------------------log-----------------------------&lt;/P&gt;&lt;P&gt;24&lt;BR /&gt;25 GOPTIONS ACCESSIBLE;&lt;BR /&gt;26 %let selstr = count(*) as n_rows;&lt;BR /&gt;27 proc sql;&lt;BR /&gt;connect to hadoop (&lt;BR /&gt;server="XXXXXXXXXXXX"&lt;BR /&gt;uri="jdbc:hive2://XXXXXX,XXXX,XXXX/;&lt;BR /&gt;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=XXXserver2" HIVE_KERBEROS_PRINCIPAL="hive/XXXXXX"&lt;BR /&gt;port=xxxxx&lt;BR /&gt;schema=xxxxx&lt;BR /&gt;read_method=JDBC&lt;BR /&gt;properties="mapred.job.queue.name=XXXX_default;hive.fetch.task.conversion=minimal;hive.cbo.enable=true;&lt;BR /&gt;hive.compute.query.using.stats=true;hive.stats.fetch.column.stats=true;hive.stats.fetch.partition.stats=true;&lt;BR /&gt;hive.prewarm.enabled=true;"&lt;BR /&gt;dbmax_text=255&lt;BR /&gt;);&lt;BR /&gt;create table work.test as&lt;BR /&gt;select&lt;BR /&gt;*&lt;BR /&gt;from connection to hadoop (&lt;BR /&gt;select&lt;BR /&gt;&amp;amp;selstr.&lt;BR /&gt;from &amp;amp;table.&lt;BR /&gt;where to_date(process_date_time) &amp;gt;=&amp;amp;test_date_1. and to_date(process_date_time) &amp;lt;=&amp;amp;test_date_2.&lt;BR /&gt;);&lt;BR /&gt;disconnect from hadoop;&lt;/P&gt;&lt;P&gt;quit;&lt;BR /&gt;NOTE: Compression was disabled for data set WORK.TEST because compression overhead would increase the size of the data set.&lt;BR /&gt;NOTE: Table WORK.TEST created, with 1 rows and 1 columns.&lt;/P&gt;&lt;P&gt;51 ;&lt;BR /&gt;2 The SAS System 09:56 Tuesday, October 4, 2022&lt;/P&gt;&lt;P&gt;52 disconnect from hadoop;&lt;BR /&gt;53 quit;&lt;BR /&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;real time 17:21.22&lt;BR /&gt;user cpu time 0.12 seconds&lt;BR /&gt;system cpu time 0.08 seconds&lt;BR /&gt;memory 5598.90k&lt;BR /&gt;OS Memory 43688.00k&lt;BR /&gt;Timestamp 04/10/2022 11:18:42 AM&lt;BR /&gt;Step Count 9 Switch Count 3&lt;BR /&gt;Page Faults 0&lt;BR /&gt;Page Reclaims 312&lt;BR /&gt;Page Swaps 0&lt;BR /&gt;Voluntary Context Switches 5065&lt;BR /&gt;Involuntary Context Switches 50&lt;BR /&gt;Block Input Operations 0&lt;BR /&gt;Block Output Operations 0&lt;/P&gt;&lt;P&gt;54&lt;BR /&gt;55 GOPTIONS NOACCESSIBLE;&lt;BR /&gt;56 %LET _CLIENTTASKLABEL=;&lt;BR /&gt;57 %LET _CLIENTPROCESSFLOWNAME=;&lt;BR /&gt;58 %LET _CLIENTPROJECTPATH=;&lt;BR /&gt;59 %LET _CLIENTPROJECTPATHHOST=;&lt;BR /&gt;60 %LET _CLIENTPROJECTNAME=;&lt;BR /&gt;61 %LET _SASPROGRAMFILE=;&lt;BR /&gt;62 %LET _SASPROGRAMFILEHOST=;&lt;BR /&gt;63&lt;BR /&gt;64 ;*';*";*/;quit;run;&lt;BR /&gt;65 ODS _ALL_ CLOSE;&lt;BR /&gt;66&lt;BR /&gt;67&lt;BR /&gt;68 QUIT; RUN;&lt;BR /&gt;69&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 10:44:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836681#M330818</guid>
      <dc:creator>hema_bhure</dc:creator>
      <dc:date>2022-10-04T10:44:52Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836728#M330826</link>
      <description>&lt;P&gt;I think this might be a Hive optimization problem.&lt;/P&gt;
&lt;P&gt;You should execute the query directly into Hive.&lt;/P&gt;
&lt;P&gt;Hive is not a very fast engine, especially for small-medium sized data (whatever that is).&lt;/P&gt;
&lt;P&gt;I have experienced that Impala is usually more performant, but that may require additional SAS/ACCESS licensing.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 14:32:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836728#M330826</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2022-10-04T14:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836734#M330828</link>
      <description>Thank you &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt; , i can execute query in Hive but this source for us and we are doing etL into SAS. so i need to connection to Hadoop to access the data. thank you once again.</description>
      <pubDate>Tue, 04 Oct 2022 15:27:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836734#M330828</guid>
      <dc:creator>hema_bhure</dc:creator>
      <dc:date>2022-10-04T15:27:04Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836736#M330830</link>
      <description>&lt;P&gt;I didn't mean to do it as permanent solution, just as a way to see if the excessive execution time is in the Hive query or in the data transport to SAS.&lt;/P&gt;
&lt;P&gt;So execute it, measure the time and compare with your SAS SQL pass through measurements.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 15:37:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836736#M330830</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2022-10-04T15:37:58Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836788#M330851</link>
      <description>&lt;P&gt;The row count test takes less than half the time of the original SAS table query. That suggests the network speed between Hive and SAS is slowing performance a lot. There is not a lot you can do about that, except splitting your query and running it in parallel over different date periods. You might get better performance outside of normal business hours as well so you could schedule it to run overnight when run time isn't so important.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 19:17:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836788#M330851</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-10-04T19:17:45Z</dc:date>
    </item>
    <item>
      <title>Re: need to improve performance of query which use connection to hive</title>
      <link>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836991#M330954</link>
      <description>&lt;P&gt;thank you&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;, I will try this.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Oct 2022 15:48:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/need-to-improve-performance-of-query-which-use-connection-to/m-p/836991#M330954</guid>
      <dc:creator>hema_bhure</dc:creator>
      <dc:date>2022-10-05T15:48:05Z</dc:date>
    </item>
  </channel>
</rss>

