<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Pass through connection to hadoop in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/426232#M13137</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112354"&gt;@krishnaram101&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I notice that port 8443 is listed. This means Knox is probably involved... This could impact how the data is brought back to SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of things to look at:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) Make sure that you aren't being hit with the 32k String Thing (this is Hive string types being brought back as 32k strings). This significantly increased the amount of network traffic and makes writing the SAS data set to disk slower. The Github link includes an example of how to tell if this is happening. You can also look at the metadata for the created table to see if there are any 32k columns included in the table. The fact that you are returning 8K means that you may be returning a lot of data. If the 32k string thing were happening it would likely fill your disk up.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) See how long this query takes to run. if the times are similar it means most of the time is being spent by Hadoop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;create table sastest.test as select * from connection to hadoop&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;( select count(*) from test limit 100000);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For more information about the 32k String Thing see the slides and exercises in this workshop - &lt;A href="https://github.com/Jeff-Bailey/SGF2016_SAS3880_Insiders_Guide_Hadoop_HOW" target="_self"&gt;https://github.com/Jeff-Bailey/SGF2016_SAS3880_Insiders_Guide_Hadoop_HOW&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also check out&amp;nbsp;this SAS Global Forum paper: &lt;A href="http://support.sas.com/resources/papers/proceedings17/SAS0190-2017.pdf" target="_blank"&gt;Ten Tips to Unlock the Power of Hadoop with SAS®&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this doesn't help, you may want to consider opening a tech support track.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best wishes,&lt;BR /&gt;Jeff&lt;/P&gt;</description>
    <pubDate>Tue, 13 Feb 2018 16:15:12 GMT</pubDate>
    <dc:creator>JBailey</dc:creator>
    <dc:date>2018-02-13T16:15:12Z</dc:date>
    <item>
      <title>SAS Pass through connection to hadoop</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/316872#M9099</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any efficient way to have SAS pass through connection with Hadoop while importing huge datasets. The data has nearly 2M rows and 8K columns. Thanks !&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 22:49:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/316872#M9099</guid>
      <dc:creator>krishnaram101</dc:creator>
      <dc:date>2016-12-05T22:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Pass through connection to hadoop</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/316995#M9104</link>
      <description>&lt;P&gt;Please define "import".&lt;/P&gt;
&lt;P&gt;From what format, to which format?&lt;/P&gt;
&lt;P&gt;If the data shouldn't "touch" SAS during import, you could use EXECUTE blocks in PROC SQL to Hive, or PROC HADOOP for operations outside Hive.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2016 12:20:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/316995#M9104</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-12-06T12:20:42Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Pass through connection to hadoop</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/317273#M9113</link>
      <description>&lt;P&gt;I used the following code to access data from hadoop. It took me 6 hrs for get 100000 records and 8K columns which seems very slow. Without options, it took 8 hrs. Can you please check and give suggestions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;options SGIO=yes;&lt;BR /&gt;options bufno=2000 bufsize=48K;&lt;BR /&gt;Libname sastest 'E:\SASMA\SASUserData\User\krishnaramasamy\Hadoop data';&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;connect to hadoop (user=%LOWCASE(&amp;amp;SYSUSERID.) password="XXXXX"&lt;BR /&gt;server='YYYYYY' uri='jdbc:hive2://YYYYYYY.com:8443/default?hive.server2.transport.mode=http;hive.execution.engine=tez;hive.server2.thrift.http.path=gateway/hdpprod/hive;hive.execution.engine=tez' schema=ZZZZZ);&lt;BR /&gt;create table sastest.test as select * from connection to hadoop&lt;BR /&gt;(&lt;BR /&gt;select * from test&lt;BR /&gt;limit 100000&lt;BR /&gt;);&lt;BR /&gt;disconnect from hadoop ;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2016 12:24:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/317273#M9113</guid>
      <dc:creator>krishnaram101</dc:creator>
      <dc:date>2016-12-07T12:24:42Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Pass through connection to hadoop</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/317411#M9116</link>
      <description>&lt;P&gt;This seems like a Hadoop/Hive admin issue, not SAS (since it's the query inside Hive that takes time - unless you have extremely smll&amp;nbsp;bandwidth to the Hadoop cluster).&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2016 18:37:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/317411#M9116</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-12-07T18:37:00Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Pass through connection to hadoop</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/426232#M13137</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112354"&gt;@krishnaram101&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I notice that port 8443 is listed. This means Knox is probably involved... This could impact how the data is brought back to SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of things to look at:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) Make sure that you aren't being hit with the 32k String Thing (this is Hive string types being brought back as 32k strings). This significantly increased the amount of network traffic and makes writing the SAS data set to disk slower. The Github link includes an example of how to tell if this is happening. You can also look at the metadata for the created table to see if there are any 32k columns included in the table. The fact that you are returning 8K means that you may be returning a lot of data. If the 32k string thing were happening it would likely fill your disk up.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) See how long this query takes to run. if the times are similar it means most of the time is being spent by Hadoop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;create table sastest.test as select * from connection to hadoop&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;( select count(*) from test limit 100000);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For more information about the 32k String Thing see the slides and exercises in this workshop - &lt;A href="https://github.com/Jeff-Bailey/SGF2016_SAS3880_Insiders_Guide_Hadoop_HOW" target="_self"&gt;https://github.com/Jeff-Bailey/SGF2016_SAS3880_Insiders_Guide_Hadoop_HOW&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also check out&amp;nbsp;this SAS Global Forum paper: &lt;A href="http://support.sas.com/resources/papers/proceedings17/SAS0190-2017.pdf" target="_blank"&gt;Ten Tips to Unlock the Power of Hadoop with SAS®&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this doesn't help, you may want to consider opening a tech support track.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best wishes,&lt;BR /&gt;Jeff&lt;/P&gt;</description>
      <pubDate>Tue, 13 Feb 2018 16:15:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Pass-through-connection-to-hadoop/m-p/426232#M13137</guid>
      <dc:creator>JBailey</dc:creator>
      <dc:date>2018-02-13T16:15:12Z</dc:date>
    </item>
  </channel>
</rss>

