Re: SQL on Hive error: Failed to create spark client

AndrewHowell · Posted 10-12-2018 08:16 AM

I know SAS, SQL, SASTRACE, etc very well, but I'm a newbie to Hive, trying to understand why extractions work, but summarisations generate errors. (Yes I appreciate Hive isn't actually an SQL database, just trying to figure why somethings work & some don't).

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:

org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.

SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )

group by year, month

Code & logs below - any assistance appreciated.

options SET=SAS_HADOOP_JAR_PATH="/sas94/thirdparty/hadoop/lib";
options sastrace=',,,dsat' sastraceloc=saslog symbolgen mprint mlogic;

options fullstimer;

proc sql stimer;
	connect using SAS_HIVE; 
	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, amount
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
	);

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount)as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

	disconnect from SAS_HIVE;
quit;


33         options fullstimer;
34         
35         proc sql stimer;
36         	connect using SAS_HIVE;

  241 1539345625 no_name 0 SQL (2) 
HADOOP_44: Executed: on connection 3 242 1539345625 no_name 0 SQL (2) 
USE `hive_sasfm` 243 1539345625 no_name 0 SQL (2) 
  244 1539345625 no_name 0 SQL (2) 
NOTE: SQL Statement used (Total process time):
      real time           0.41 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              1320.75k
      OS Memory           23788.00k
      
37         	select distinct * from connection to SAS_HIVE
38         	(
39         		SELECT year, month, amount
40         			FROM H_TABLE
41         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
42         	);

  245 1539345626 no_name 0 SQL (2) 
HADOOP_45: Prepared: on connection 3 246 1539345626 no_name 0 SQL (2) 
SELECT year, month, amount FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 247 1539345626 no_name 0 SQL (2) 
  248 1539345626 no_name 0 SQL (2) 
  249 1539345626 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 250 1539345626 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     0.111738 251 1539345626 no_name 0 SQL (2) 
Total SQL describe seconds were:                    0.000496 252 1539345626 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     0.139514 253 1539345626 no_name 0 SQL (2) 
  254 1539345626 no_name 0 SQL (2) 

NOTE: SQL Statement used (Total process time):
      real time           0.14 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              5260.46k
      OS Memory           27820.00k
      Timestamp           12/10/2018 11:00:26 PM

44         	select distinct * from connection to SAS_HIVE
45         	(
46         		SELECT year, month, min(amount)as min_date
47         			FROM H_TABLE
48         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
49         			group by year, month
50         	);

  255 1539345626 no_name 0 SQL (2) 
HADOOP_46: Prepared: on connection 3 256 1539345626 no_name 0 SQL (2) 
SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) group by year, 
month 257 1539345626 no_name 0 SQL (2) 
  258 1539345626 no_name 0 SQL (2) 
  259 1539345629 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 260 1539345629 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     3.649941 261 1539345629 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     3.650159 262 1539345629 no_name 0 SQL (2) 
  263 1539345629 no_name 0 SQL (2) 

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session: 
       org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 
       group by year, month

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: SQL Statement used (Total process time):
      
52         	disconnect from SAS_HIVE;

NOTE: Statement not executed due to NOEXEC option.
NOTE: SQL Statement used (Total process time):
      
53         quit;

JBailey · Posted 10-12-2018 09:48 AM

Hi @AndrewHowell

This may be a long shot but there should be space between the function call and the "as" keyword. Try changing "min(amount)as min_date" to "min(amount) as min_date".

Here is the code:

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount) as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

Like I said, it is a long shot but could fix it.

Best wishes,

Jeff

AndrewHowell · Posted 10-14-2018 09:45 AM

Thanks @JBailey bit that wasn't the issue.

I tried:

min(amount)
min(amount) min_date
min(amount) as min_date
all the same error.

BTW, the table is partitioned by year, month. And it's a managed table (not external).

JBailey · Posted 10-14-2018 07:59 PM

Hi @AndrewHowell

I did a bit of Googling and found this...

"The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'."

It sounds like it might be what you are facing.

Best wishes,
Jeff

AndrewHowell · Posted 10-15-2018 10:33 PM

Issue not solved but at least understood:

The Hive database in question is public - anyone with access to the cluster could access (and extract) data.
However, if aggregation was required, a Spark session would start, but would fail for any user other than the owner of the database.

Need to revise the security model..

BigRider · Posted 02-24-2022 08:18 AM

Hi everyone,

I´m facing the some problem. If I understood the architecture of connector SAS interface to Hadoop, It connects to a hive server and after submit processing to a spark cluster.

I configured as this article, setting by default engine and redirecting to a spark cluster.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

But, I got exactly this error when I´m using some where clauses.

@AndrewHowell , did you suceeed to solve this issue ? If yes, how do you get success to run jobs on spark engine ? Can you provide your config or properties you adjusted to run ?

Thanks in advance,

Regardas

SQL on Hive error: Failed to create spark client