<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: File upload compression when bullkloading with Impala in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485662#M126205</link>
    <description>&lt;P&gt;There is no such setting as far as I know.&lt;/P&gt;
&lt;P&gt;Maybe the network can layer compress packets?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hive shoudl be faster to send to HDFS.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 10 Aug 2018 05:32:54 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2018-08-10T05:32:54Z</dc:date>
    <item>
      <title>File upload compression when bullkloading with Impala</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485541#M126164</link>
      <description>&lt;P&gt;I'm currently using the Cloudera ODBC driver for Impala to bulkload datasets to Hadoop.&amp;nbsp; Part of the underlying workflow is the dataset is written to a temporary file in text format and then transferred to HDFS.&amp;nbsp; &lt;STRONG&gt;Is there a setting to have that text file compressed in order to speed up this transfer (that is, compressed on the client, uploaded, and then uncompressed on the host HDFS system)?&lt;/STRONG&gt;&amp;nbsp; The dataset is written to a text file and then transferred to a directory in HDFS (/tmp); having this text file compressed before the transfer would greatly speed up the process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The client is a SAS 9.4 (M4) workstation with ACCESS to Impala (V9.43), on a Windows 7 machine.&amp;nbsp; The host HDFS is a Kerberized environment running Cloudera distro of Hadoop.&amp;nbsp; Connection is via ssh tunnel to the host.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is an example of the SAS syntax for creating the table using the bulkload option.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Connect to cluster using Impala ODBC driver */
libname hdp impala dsn='DPL Impala 64bit' schema= test ; 

/* Create the table using bulkload*/
data hdp.my_table (bulkload=yes dbcreate_table_opts='stored as parquet') ;
    set work.my_table ;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2018 17:42:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485541#M126164</guid>
      <dc:creator>Deron</dc:creator>
      <dc:date>2018-08-09T17:42:04Z</dc:date>
    </item>
    <item>
      <title>Re: File upload compression when bullkloading with Impala</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485662#M126205</link>
      <description>&lt;P&gt;There is no such setting as far as I know.&lt;/P&gt;
&lt;P&gt;Maybe the network can layer compress packets?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hive shoudl be faster to send to HDFS.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 05:32:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485662#M126205</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-08-10T05:32:54Z</dc:date>
    </item>
    <item>
      <title>Re: File upload compression when bullkloading with Impala</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485970#M126338</link>
      <description>&lt;P&gt;Thanks for posting and the idea. I will pursue that if I get time, but I guess for now I will have to manually load large datasets instead of using the SAS-Impala "bulkload" option.&amp;nbsp; So, I'm pretty much back where I started after spending many hours getting SAS-Hadoop connectivity figured out via the ssh tunnel.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 21:58:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/485970#M126338</guid>
      <dc:creator>Deron</dc:creator>
      <dc:date>2018-08-10T21:58:48Z</dc:date>
    </item>
    <item>
      <title>Re: File upload compression when bullkloading with Impala</title>
      <link>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/486139#M126424</link>
      <description>Impala for reading. Hive for writing seems to be the rule of thumb</description>
      <pubDate>Sun, 12 Aug 2018 10:29:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/File-upload-compression-when-bullkloading-with-Impala/m-p/486139#M126424</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-08-12T10:29:28Z</dc:date>
    </item>
  </channel>
</rss>

