I want to set a Hive property as follows:
EXECUTE(set hive.resultset.use.unique.column.names = FALSE) BY &Conn;
The above command works fine in explicit pass through SQL, like so:
PROC SQL;
CONNECT TO &SAS_Eng AS &Conn (&Connect);
EXECUTE(set hive.resultset.use.unique.column.names = FALSE) BY &Conn;
EXECUTE(set hive.execution.engine = &Engine) BY &Conn;
EXECUTE(set mapred.job.name = &Type._Members_Direct) BY &Conn;
CREATE TABLE WORK.Members AS
SELECT *
FROM CONNECTION TO &Conn (
SELECT *
FROM &Schema..ALL_MEMBERS
);
DISCONNECT FROM &Conn;
QUIT;
However, a lot of our analysts don't want to code all the details of explicit pass through code for every little query. They just want to use regular Proc SQL. Normally, I'd code a PROPERTIES= option on a Libname using the Hadoop engine, but on this particular machine, we only have ODBC licensed. When I try to code a PROPERTIES= on the Libname with an ODBC engine, I get:
ERROR 22-7: Invalid option name PROPERTIES.
Apparently, PROPERTIES is not a valid option when using the ODBC engine.
Can anyone tell me how to set a Hive property with regular Proc SQL when only the ODBC engine is available?
Jim
Why not call
proc sql;
&sql_hadoop_header;
create table ...
quit;
Thank you.
So, would &SQL_Hadoop_Header resolve to something like the following?
CONNECT TO &SAS_Eng AS &Conn (&Connect);
EXECUTE(set hive.resultset.use.unique.column.names = FALSE) BY &Conn;
EXECUTE(set hive.execution.engine = &Engine) BY &Conn;
EXECUTE(set mapred.job.name = &Type._Members_Direct) BY &Conn;
That's certainly workable, but then all queries have to be coded in HQL instead of SAS SQL. One advantage of using just regular Proc SQL is that our analysts can code in a syntax that they're already familiar with and can use SAS functions (e.g. date functions, string functions, etc.) instead of having to learn the Hadoop/Hive equivalents. It may be necessary to go that route if we want to set the hive.resultset.use.unique.column.names, but we'd rather not if there's another way.
I'm a bit frustrated. It's such a simple thing, and I know of two different ways to do it, unfortunately, neither of which works in this instance.
Jim
> all queries have to be coded in HQL instead of SAS SQL
Not necessarily, you can have SAS code and/or native code. And you could have parameters too.
proc sql;
%SQL_Hadoop_Header(connection=&conxn, options=nouniquename);
select datepart(DATE) from connection to &conxn ( hql... );
Oh, dear. I think you lost me there. Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL?
If I execute explicit pass through code and set the hive.resultset.use.unique.column.names, that's all well and fine, but as long as the connection is open, doesn't all code have to be HQL or Hive commands? If I try to execute SAS SQL, I'll get syntax errors, won't I? Now, I could disconnect first, but won't my setting for hive.resultset.use.unique.column.names disappear as soon as I disconnect? Or will the setting persist so long as I don't issue a QUIT in Proc SQL?
I suppose I should try it. Back in a bit...
Jim
> Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL?
Yes.
All the SQL that includes select .. from connection to xxx () will be sent to Hive, but all the regular SQL such as select * from SASHELP.CLASS will stay in SAS, even if a connection is open.
You can even open several connections in the same proc sql step.
Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL? I'm working on it, and it runs, but it doesn't seem that a setting applied via a connection carries over to a SAS SQL query.
Jim
There's some larger issue going on here. Even with explicit pass through code, I can no longer shut off the table name from being prefixed to the column name.
Here are the results from some explicit pass through code. The first display is just whatever the value is after I make the connection (i.e. the default value). Note that it is "true".
Then, I execute a SET statement and thereafter display the value. Note that the value is now "false".
Lastly, I execute a query -- but the column names still include the table name. The hive.resultset.use.unique.column.names parameter has no apparent effect.
Jim
> Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL?
I don't know, though it looks like it from your last post. SAS Tech Support should know.
> Then, I execute a SET statement and thereafter display the value. Note that the value is now "false".
I don't know this option. Ask the Hadoop admins to have a look? Or on a web forum for the ODBC driver (Impala?)
I'll have to ask SAS tech support about the settings in a direct connection being applicable to regular Proc SQL code run in the same Proc SQL session. I suspect they don't apply. In this case, I switched everything to explicit pass through code because I didn't know how to set hive.resultset.use.unique.column.names any other way. The odd thing of it is that even though I set the value to 'false', the table names are still prefixed to the column names. Setting the parameter manually in my query just isn't working.
I think the corrective action has to be a change to the Hive XML as described in this SAS usage note:
No worries, glad I could help. 🙂
I suspect each execute/select statement runs in its own environment, even though it is the same connection.
The UNote does not state where that XML file sits. On the client PC or on the Hadoop server?
The XML file is on the server side. It controls the behavior of Hadoop and isn't specifically related to SAS.
I've been talking with our platform support folks, but we can't figure out why, when I set the parameter correctly to false, the table names are still being prefixed to the column names. When our support folks use command line Hive commands, it works fine.
As a work around at least for now, I wrote a macro that removes the table names from the column names after the Hive query completes. A bit of a kludge, but I prefer not to use 'non_V7_SAS_name'n type column names.
Code below if it were of interest.
Jim
&Null %MACRO Remove_Table_From_Column(Lib, Table, Order=VARNUM, Debug=NO);
%LOCAL Cmnt;
%LOCAL NoPrint;
%LOCAL Vars;
%IF %QUPCASE(&Debug) = YES %THEN
%DO;
%LET Cmnt = ;
%LET NoPrint = ;
%PUT &Nte1 Debug is on in macro &SysMacroName;
%END;
%ELSE
%DO;
%LET Cmnt = *;
%LET NoPrint = NOPRINT;
%END;
%&Cmnt.TITLE(BEFORE - Contents of &Lib..&Table);
PROC CONTENTS DATA=&Lib..&Table
OUT =WORK.&Table.V
&Order
&NoPrint
;
RUN;
PROC SQL NOPRINT;
SELECT CATS("'", Name, "'n=",SCAN(SUBSTR(Name, INDEXC(Name, '.')), 1))
INTO : Vars SEPARATED BY ' '
FROM WORK.&Table.V
;
QUIT;
%&Cmnt.PUT &Nte1 &=Vars;
PROC DATASETS LIBRARY=&Lib NOLIST;
MODIFY &Table;
RENAME
&Vars
;
QUIT;
%&Cmnt.TITLE(AFTER - Contents of &Lib..&Table);
PROC CONTENTS DATA=&Lib..&Table
OUT =WORK.&Table.V
&Order
&NoPrint
;
RUN;
%MEND Remove_Table_From_Column;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.