How can I set a Hive property when using regular Proc SQL?

jimbarbour · Posted 09-15-2020 07:44 PM

I want to set a Hive property as follows:

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

The above command works fine in explicit pass through SQL, like so:

PROC	SQL;
	CONNECT	TO	&SAS_Eng	AS	&Conn	(&Connect);

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

	EXECUTE(set	hive.execution.engine						=	&Engine) 					BY	&Conn;

	EXECUTE(set	mapred.job.name								=	&Type._Members_Direct)		BY	&Conn;

	CREATE	TABLE	WORK.Members							AS
		SELECT	*	
			FROM	CONNECTION	TO	&Conn					(
				SELECT	*
					FROM	&Schema..ALL_MEMBERS
						);

	DISCONNECT	FROM	&Conn;
QUIT;

However, a lot of our analysts don't want to code all the details of explicit pass through code for every little query. They just want to use regular Proc SQL. Normally, I'd code a PROPERTIES= option on a Libname using the Hadoop engine, but on this particular machine, we only have ODBC licensed. When I try to code a PROPERTIES= on the Libname with an ODBC engine, I get:

ERROR 22-7: Invalid option name PROPERTIES.

Apparently, PROPERTIES is not a valid option when using the ODBC engine.

Can anyone tell me how to set a Hive property with regular Proc SQL when only the ODBC engine is available?

Jim

ChrisNZ · Posted 09-15-2020 08:20 PM

Why not call

proc sql;

&sql_hadoop_header;

create table ...

quit;

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 09-15-2020 08:28 PM

@ChrisNZ,

Thank you.

So, would &SQL_Hadoop_Header resolve to something like the following?

	CONNECT	TO	&SAS_Eng	AS	&Conn	(&Connect);

	EXECUTE(set	hive.resultset.use.unique.column.names		=	FALSE) 						BY	&Conn;

	EXECUTE(set	hive.execution.engine						=	&Engine) 					BY	&Conn;

	EXECUTE(set	mapred.job.name								=	&Type._Members_Direct)		BY	&Conn;

That's certainly workable, but then all queries have to be coded in HQL instead of SAS SQL. One advantage of using just regular Proc SQL is that our analysts can code in a syntax that they're already familiar with and can use SAS functions (e.g. date functions, string functions, etc.) instead of having to learn the Hadoop/Hive equivalents. It may be necessary to go that route if we want to set the hive.resultset.use.unique.column.names, but we'd rather not if there's another way.

I'm a bit frustrated. It's such a simple thing, and I know of two different ways to do it, unfortunately, neither of which works in this instance.

Jim

ChrisNZ · Posted 09-15-2020 09:21 PM

> all queries have to be coded in HQL instead of SAS SQL

Not necessarily, you can have SAS code and/or native code. And you could have parameters too.

proc sql;

%SQL_Hadoop_Header(connection=&conxn, options=nouniquename);

select datepart(DATE) from connection to &conxn ( hql... );

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 09-15-2020 10:01 PM

@ChrisNZ,

Oh, dear. I think you lost me there. Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL?

If I execute explicit pass through code and set the hive.resultset.use.unique.column.names, that's all well and fine, but as long as the connection is open, doesn't all code have to be HQL or Hive commands? If I try to execute SAS SQL, I'll get syntax errors, won't I? Now, I could disconnect first, but won't my setting for hive.resultset.use.unique.column.names disappear as soon as I disconnect? Or will the setting persist so long as I don't issue a QUIT in Proc SQL?

I suppose I should try it. Back in a bit...

Jim

ChrisNZ · Posted 09-15-2020 10:33 PM

> Are you saying that in one instance of Proc SQL I could have a mix of explicit pass through code AND also SAS SQL?

Yes.

All the SQL that includes select .. from connection to xxx () will be sent to Hive, but all the regular SQL such as select * from SASHELP.CLASS will stay in SAS, even if a connection is open.

You can even open several connections in the same proc sql step.

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 09-15-2020 11:30 PM

Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL? I'm working on it, and it runs, but it doesn't seem that a setting applied via a connection carries over to a SAS SQL query.

Jim

jimbarbour · Posted 09-16-2020 12:50 AM

There's some larger issue going on here. Even with explicit pass through code, I can no longer shut off the table name from being prefixed to the column name.

Here are the results from some explicit pass through code. The first display is just whatever the value is after I make the connection (i.e. the default value). Note that it is "true".

Then, I execute a SET statement and thereafter display the value. Note that the value is now "false".

Lastly, I execute a query -- but the column names still include the table name. The hive.resultset.use.unique.column.names parameter has no apparent effect.

Jim

ChrisNZ · Posted 09-16-2020 06:01 PM

> Well, right, that makes sense, but will the properties I set in a connection apply to a query run using SAS SQL?

I don't know, though it looks like it from your last post. SAS Tech Support should know.

> Then, I execute a SET statement and thereafter display the value. Note that the value is now "false".

I don't know this option. Ask the Hadoop admins to have a look? Or on a web forum for the ODBC driver (Impala?)

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 09-16-2020 09:16 PM

@ChrisNZ,

I'll have to ask SAS tech support about the settings in a direct connection being applicable to regular Proc SQL code run in the same Proc SQL session. I suspect they don't apply. In this case, I switched everything to explicit pass through code because I didn't know how to set hive.resultset.use.unique.column.names any other way. The odd thing of it is that even though I set the value to 'false', the table names are still prefixed to the column names. Setting the parameter manually in my query just isn't working.

I think the corrective action has to be a change to the Hive XML as described in this SAS usage note:

https://support.sas.com/kb/53/568.html

That's a great idea to check on an ODBC forum. I'm not sure how many people are connecting to Hadoop via ODBC (usually it's JDBC), but perhaps there is such a forum. I won't know until I look.

Chiefly though, I thank you for pointing out that explicit pass through code (indeed, with multiple connections) can exist alongside regular SAS SQL in a single Proc SQL session. I was artificially constraining myself without realizing it. I had a "blinding flash of the obvious" when you pointed out the "box" I was putting myself into. Many thanks.

Jim

ChrisNZ · Posted 09-16-2020 11:58 PM

No worries, glad I could help. 🙂

I suspect each execute/select statement runs in its own environment, even though it is the same connection.

The UNote does not state where that XML file sits. On the client PC or on the Hadoop server?

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 09-18-2020 01:29 AM

@ChrisNZ,

The XML file is on the server side. It controls the behavior of Hadoop and isn't specifically related to SAS.

I've been talking with our platform support folks, but we can't figure out why, when I set the parameter correctly to false, the table names are still being prefixed to the column names. When our support folks use command line Hive commands, it works fine.

As a work around at least for now, I wrote a macro that removes the table names from the column names after the Hive query completes. A bit of a kludge, but I prefer not to use 'non_V7_SAS_name'n type column names.

Code below if it were of interest.

Jim

&Null	%MACRO	Remove_Table_From_Column(Lib, Table, Order=VARNUM, Debug=NO);
	%LOCAL	Cmnt;
	%LOCAL	NoPrint;
	%LOCAL	Vars;

	%IF	%QUPCASE(&Debug)	=	YES	%THEN
		%DO;
			%LET	Cmnt	=	;
			%LET	NoPrint	=	;
			%PUT	&Nte1  Debug is on in macro &SysMacroName;
		%END;
	%ELSE
		%DO;
			%LET	Cmnt	=	*;
			%LET	NoPrint	=	NOPRINT;
		%END;

	%&Cmnt.TITLE(BEFORE - Contents of &Lib..&Table);
	PROC	CONTENTS	DATA=&Lib..&Table
						OUT	=WORK.&Table.V
						&Order
						&NoPrint
						;
	RUN;

	PROC	SQL	NOPRINT;
		SELECT	CATS("'", Name, "'n=",SCAN(SUBSTR(Name, INDEXC(Name, '.')), 1))
			INTO	:	Vars	SEPARATED BY ' '
			FROM	WORK.&Table.V
			;
	QUIT;

	%&Cmnt.PUT	&Nte1  &=Vars;

	PROC	DATASETS	LIBRARY=&Lib	NOLIST;
		MODIFY	&Table;
			RENAME	
				&Vars
				;
	QUIT;

	%&Cmnt.TITLE(AFTER - Contents of &Lib..&Table);
	PROC	CONTENTS	DATA=&Lib..&Table
						OUT	=WORK.&Table.V
						&Order
						&NoPrint
						;
	RUN;
%MEND	Remove_Table_From_Column;

How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Re: How can I set a Hive property when using regular Proc SQL?

Classroom Training Available!