BookmarkSubscribeRSS Feed
jimbarbour
Meteorite | Level 14

I want to set a Hive property as follows:

 

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

 

 

The above command works fine in explicit pass through SQL, like so:

 

PROC	SQL;
	CONNECT	TO	&SAS_Eng	AS	&Conn	(&Connect);

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

	EXECUTE(set	hive.execution.engine						=	&Engine) 					BY	&Conn;

	EXECUTE(set	mapred.job.name								=	&Type._Members_Direct)		BY	&Conn;

	CREATE	TABLE	WORK.Members							AS
		SELECT	*	
			FROM	CONNECTION	TO	&Conn					(
				SELECT	*
					FROM	&Schema..ALL_MEMBERS
						);

	DISCONNECT	FROM	&Conn;
QUIT;

 

However, a lot of our analysts don't want to code all the details of explicit pass through code for every little query.  They just want to use regular Proc SQL.  Normally, I'd code a PROPERTIES= option on a Libname using the Hadoop engine, but on this particular machine, we only have ODBC licensed.  When I try to code a PROPERTIES= on the Libname with an ODBC engine, I get:

ERROR 22-7: Invalid option name PROPERTIES.

Apparently, PROPERTIES is not a valid option when using the ODBC engine.

 

Can anyone tell me how to set a Hive property with regular Proc SQL when only the ODBC engine is available?

 

Jim

 

 

 

13 REPLIES 13
ChrisNZ
Tourmaline | Level 20

Why not call

proc sql; 

  &sql_hadoop_header;

  create table ...

quit;

 

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

Thank you.

 

So, would &SQL_Hadoop_Header resolve to something like the following?

	CONNECT	TO	&SAS_Eng	AS	&Conn	(&Connect);

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

	EXECUTE(set	hive.execution.engine						=	&Engine) 					BY	&Conn;

	EXECUTE(set	mapred.job.name								=	&Type._Members_Direct)		BY	&Conn;

That's certainly workable, but then all queries have to be coded in HQL instead of SAS SQL.  One advantage of using just regular Proc SQL is that our analysts can code in a syntax that they're already familiar with and can use SAS functions (e.g. date functions, string functions, etc.) instead of having to learn the Hadoop/Hive equivalents.  It may be necessary to go that route if we want to set the hive.resultset.use.unique.column.names, but we'd rather not if there's another way.

 

I'm a bit frustrated.  It's such a simple thing, and I know of two different ways to do it, unfortunately, neither of which works in this instance.

 

Jim

ChrisNZ
Tourmaline | Level 20

> all queries have to be coded in HQL instead of SAS SQL

Not necessarily, you can have SAS code and/or native code. And you could have parameters too.

proc sql;

  %SQL_Hadoop_Header(connection=&conxn, options=nouniquename);

  select datepart(DATE) from connection to &conxn (   hql...  );

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

Oh, dear.  I think you lost me there.  Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL? 

 

If I execute explicit pass through code and set the hive.resultset.use.unique.column.names, that's all well and fine, but as long as the connection is open, doesn't all code have to be HQL or Hive commands?  If I try to execute SAS SQL, I'll get syntax errors, won't I?  Now, I could disconnect first, but won't my setting for hive.resultset.use.unique.column.names disappear as soon as I disconnect?  Or will the setting persist so long as I don't issue a QUIT in Proc SQL?

 

I suppose I should try it.  Back in a bit...

 

Jim

ChrisNZ
Tourmaline | Level 20

> Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL? 

Yes.

All the SQL that includes   select .. from connection to xxx ()   will be sent to Hive, but all the regular SQL such as select * from SASHELP.CLASS   will stay in SAS, even if a connection is open.

 

You can even open several connections in the same proc sql step. 

jimbarbour
Meteorite | Level 14

Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL?  I'm working on it, and it runs, but it doesn't seem that a setting applied via a connection carries over to a SAS SQL query.

 

Jim

jimbarbour
Meteorite | Level 14

There's some larger issue going on here.  Even with explicit pass through code, I can no longer shut off the table name from being prefixed to the column name.

 

Here are the results from some explicit pass through code.  The first display is just whatever the value is after I make the connection (i.e. the default value).  Note that it is "true".

 

Then, I execute a SET statement and thereafter display the value.  Note that the value is now "false".

 

Lastly, I execute a query -- but the column names still include the table name.  The hive.resultset.use.unique.column.names parameter has no apparent effect.

 

Hive_TableName.ColumnName_2020-09-15_21-39-09.jpg

 

Jim

ChrisNZ
Tourmaline | Level 20

Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL? 

I don't know, though it looks like it from your last post. SAS Tech Support should know.

 

> Then, I execute a SET statement and thereafter display the value.  Note that the value is now "false".

I don't know this option. Ask the Hadoop admins to have a look? Or on a web forum for the ODBC driver (Impala?) 

 

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

I'll have to ask SAS tech support about the settings in a direct connection being applicable to regular Proc SQL code run in the same Proc SQL session.  I suspect they don't apply.  In this case, I switched everything to explicit pass through code because I didn't know how to set hive.resultset.use.unique.column.names any other way.  The odd thing of it is that even though I set the value to 'false', the table names are still prefixed to the column names.  Setting the parameter manually in my query just isn't working.

 

I think the corrective action has to be a change to the Hive XML as described in this SAS usage note:  

 
That's a great idea to check on an ODBC forum.  I'm not sure how many people are connecting to Hadoop via ODBC (usually it's JDBC), but perhaps there is such a forum.  I won't know until I look.
 
Chiefly though, I thank you for pointing out that explicit pass through code (indeed, with multiple connections) can exist alongside regular SAS SQL in a single Proc SQL session.  I was artificially constraining myself without realizing it.  I had a "blinding flash of the obvious" when you pointed out the "box" I was putting myself into.  Many thanks.
 
Jim
ChrisNZ
Tourmaline | Level 20

No worries, glad I could help. 🙂

I suspect each execute/select statement runs in its own environment, even though it is the same connection.

The UNote does not state where that XML file sits. On the client PC or on the Hadoop server?

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

The XML file is on the server side.  It controls the behavior of Hadoop and isn't specifically related to SAS.

 

I've been talking with our platform support folks, but we can't figure out why, when I set the parameter correctly to false, the table names are still being prefixed to the column names.  When our support folks use command line Hive commands, it works fine.  

 

As a work around at least for now, I wrote a macro that removes the table names from the column names after the Hive query completes.  A bit of a kludge, but I prefer not to use 'non_V7_SAS_name'n type column names.

 

Code below if it were of interest.

 

Jim

 

&Null	%MACRO	Remove_Table_From_Column(Lib, Table, Order=VARNUM, Debug=NO);
	%LOCAL	Cmnt;
	%LOCAL	NoPrint;
	%LOCAL	Vars;

	%IF	%QUPCASE(&Debug)	=	YES	%THEN
		%DO;
			%LET	Cmnt	=	;
			%LET	NoPrint	=	;
			%PUT	&Nte1  Debug is on in macro &SysMacroName;
		%END;
	%ELSE
		%DO;
			%LET	Cmnt	=	*;
			%LET	NoPrint	=	NOPRINT;
		%END;

	%&Cmnt.TITLE(BEFORE - Contents of &Lib..&Table);
	PROC	CONTENTS	DATA=&Lib..&Table
						OUT	=WORK.&Table.V
						&Order
						&NoPrint
						;
	RUN;

	PROC	SQL	NOPRINT;
		SELECT	CATS("'", Name, "'n=",SCAN(SUBSTR(Name, INDEXC(Name, '.')), 1))
			INTO	:	Vars	SEPARATED BY ' '
			FROM	WORK.&Table.V
			;
	QUIT;

	%&Cmnt.PUT	&Nte1  &=Vars;

	PROC	DATASETS	LIBRARY=&Lib	NOLIST;
		MODIFY	&Table;
			RENAME	
				&Vars
				;
	QUIT;

	%&Cmnt.TITLE(AFTER - Contents of &Lib..&Table);
	PROC	CONTENTS	DATA=&Lib..&Table
						OUT	=WORK.&Table.V
						&Order
						&NoPrint
						;
	RUN;
%MEND	Remove_Table_From_Column;

GyaniBaba
Obsidian | Level 7
Nice post! Didnyou fibally find out how to set those options from SAS code to hadoop?
Patrick
Opal | Level 21

@jimbarbour Just as an idea that might be worth trying (never done it myself): 

What if you send these 3 set statements via the dbconinit options of the libname statement. The option is documented as being available for the ODBC engine. DBCONINIT= LIBNAME Statement Option

libname myive ODBC ... 
dbconinit="
set hive.resultset.use.unique.column.names = FALSE;
set hive.execution.engine = &Engine; set mapred.job.name = &Type._Members_Direct;
SET CURRENT SQLID='myauthid'
"
;

Then use the libref also in the connect statement to always use the same connection definition however you need it.

connect using myhive;

 

Update: Just realised that this was an old discussion. Still: Getting some confirmation what actually works would be nice.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4808 views
  • 1 like
  • 4 in conversation