BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OGA
Fluorite | Level 6 OGA
Fluorite | Level 6

Hi

Users are using SAS Access to Hadoop from SAS Enterprise Guide.

I would like for all users to create tables with Parquet format as default format.

 

In order to force this, I have added the following option to the LIBNAME definition in the SAS Management Console under LIBNAME properties > Options > Advance Options > Other Properties

DBCREATE_TABLE_OPTS="stored as Parquet"

 

This works OK with Implicit SQL, but when using Explicit SQL Pass-through, this option is ignored and the table is created as Text File which is the default format for Hive. The way users connect to Hadoop with Explicit pass-through is by using a preassigned libname called HVINFLAB:

proc sql;

connect using HVINFLAB as HADOOPL;
execute by HADOOPL(

create table ci0101_tablaneglab_sas5 as
select ci0101_numpersona
from 04_negocio.ci0101
where ci0101_nroperiod = 2003

and meta_codienti = '2095');

disconnect from HADOOPL;

quit;

 

This is the LIBNAME I get from SAS Management Console

LIBNAME hvinflab HADOOP URI="jdbc:hive2://kbzae01pro-acc.bbk.es:10002/04_infra_lab;principal=hive/kbzae01pro.bbk.es@BBK.ES;ssl=true" DBCREATE_TABLE_OPTS="stored as Parquet" PORT=10002 SERVER="kbzae01pro-acc.bbk.es" SCHEMA="04_infra_lab" ;

 

I have also tried to use the field "Hive Properties" under Libname properties > Advance Options > Connection. adding the following:

hive.default.fileformat=Parquet. This is the libname statement  I get:

LIBNAME hvinflab HADOOP URI="jdbc:hive2://kbzae01pro-acc.bbk.es:10002/04_infra_lab;principal=hive/kbzae01pro.bbk.es@BBK.ES;ssl=true"  PROPERTIES="hive.default.fileformat=Parquet"  PORT=10002 SERVER="kbzae01pro-acc.bbk.es" 

 

However, SAS is also ignoring this field. It seems that if the field "Hive JDBC URL" is used (which translates to "URI=" in the libname statement), the field "Hive Properties" is ignored.

 

¿Is there a way of setting Parquet as default hadoop file format so that Explicit SQL Pass through does not ignore this configuration? I know that it can be specified on the CREATE TABLE statement but I am looking for another option so that users do not have to specify it everytime they want to create a table with Explicit SQL Pass-Through.

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

It makes sense to me that SAS doesn't add the store as parquet bit to explicit pass-through SQL. SAS is not supposed to touch explicit SQL at all.

I would expect that for implicit SQL store as parquet gets added to the SQL sent to Hadoop. Besides of training your users to also add the store as parquet bit to the explicit SQL they write, you could try if you can change the default via the DBCONINIT Libname option - something like dbconinit='set hive.default.fileformat=Parquet;'

View solution in original post

5 REPLIES 5
AnandVyas
Ammonite | Level 13

In order to make a connection to Hadoop from SAS, we copy the JAR's and XML's of the hadoop cluster to SAS compute server. In the XML files there is a hive-site.xml file from which the properties for the SAS Session are read. I think you can change the default file format property "hive.default.fileformat" in this file and see if it takes effect.

 

I haven't tried this myself so let me know if it works for you.

 

Source:

Property configuration Manual: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-...

Supported file format types:

https://cwiki.apache.org/confluence/display/Hive/FileFormats#:~:text=The%20hive.,is%20the%20paramete...

OGA
Fluorite | Level 6 OGA
Fluorite | Level 6

Hi Anand

 

I don´t have write access to those configuration files so I opted for the other suggestion becuase it seemed easier to implement.

 

Nevertheless, thanks a lot for your help 

Patrick
Opal | Level 21

It makes sense to me that SAS doesn't add the store as parquet bit to explicit pass-through SQL. SAS is not supposed to touch explicit SQL at all.

I would expect that for implicit SQL store as parquet gets added to the SQL sent to Hadoop. Besides of training your users to also add the store as parquet bit to the explicit SQL they write, you could try if you can change the default via the DBCONINIT Libname option - something like dbconinit='set hive.default.fileformat=Parquet;'

Patrick
Opal | Level 21

Thanks for the feedback. That's good to know!

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2729 views
  • 5 likes
  • 3 in conversation