BookmarkSubscribeRSS Feed
AndrewHowell
Moderator

I know SAS, SQL, SASTRACE, etc very well, but I'm a newbie to Hive, trying to understand why extractions work, but summarisations generate errors. (Yes I appreciate Hive isn't actually an SQL database, just trying to figure why somethings work & some don't).

 

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:

       org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.

SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )

       group by year, month

 

Code & logs below - any assistance appreciated.

 

options SET=SAS_HADOOP_JAR_PATH="/sas94/thirdparty/hadoop/lib";
options sastrace=',,,dsat' sastraceloc=saslog symbolgen mprint mlogic;

options fullstimer;

proc sql stimer;
	connect using SAS_HIVE; 
	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, amount
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
	);

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount)as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

	disconnect from SAS_HIVE;
quit;


33         options fullstimer;
34         
35         proc sql stimer;
36         	connect using SAS_HIVE;

  241 1539345625 no_name 0 SQL (2) 
HADOOP_44: Executed: on connection 3 242 1539345625 no_name 0 SQL (2) 
USE `hive_sasfm` 243 1539345625 no_name 0 SQL (2) 
  244 1539345625 no_name 0 SQL (2) 
NOTE: SQL Statement used (Total process time):
      real time           0.41 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              1320.75k
      OS Memory           23788.00k
      
37         	select distinct * from connection to SAS_HIVE
38         	(
39         		SELECT year, month, amount
40         			FROM H_TABLE
41         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
42         	);

  245 1539345626 no_name 0 SQL (2) 
HADOOP_45: Prepared: on connection 3 246 1539345626 no_name 0 SQL (2) 
SELECT year, month, amount FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 247 1539345626 no_name 0 SQL (2) 
  248 1539345626 no_name 0 SQL (2) 
  249 1539345626 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 250 1539345626 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     0.111738 251 1539345626 no_name 0 SQL (2) 
Total SQL describe seconds were:                    0.000496 252 1539345626 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     0.139514 253 1539345626 no_name 0 SQL (2) 
  254 1539345626 no_name 0 SQL (2) 

NOTE: SQL Statement used (Total process time):
      real time           0.14 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              5260.46k
      OS Memory           27820.00k
      Timestamp           12/10/2018 11:00:26 PM

44         	select distinct * from connection to SAS_HIVE
45         	(
46         		SELECT year, month, min(amount)as min_date
47         			FROM H_TABLE
48         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
49         			group by year, month
50         	);

  255 1539345626 no_name 0 SQL (2) 
HADOOP_46: Prepared: on connection 3 256 1539345626 no_name 0 SQL (2) 
SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) group by year, 
month 257 1539345626 no_name 0 SQL (2) 
  258 1539345626 no_name 0 SQL (2) 
  259 1539345629 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 260 1539345629 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     3.649941 261 1539345629 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     3.650159 262 1539345629 no_name 0 SQL (2) 
  263 1539345629 no_name 0 SQL (2) 

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session: 
       org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 
       group by year, month

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: SQL Statement used (Total process time):
      
52         	disconnect from SAS_HIVE;

NOTE: Statement not executed due to NOEXEC option.
NOTE: SQL Statement used (Total process time):
      
53         quit;
5 REPLIES 5
JBailey
Barite | Level 11

Hi @AndrewHowell

 

This may be a long shot but there should be space between the function call and the "as" keyword. Try changing "min(amount)as min_date" to "min(amount) as min_date".

 

Here is the code:

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount) as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

Like I said, it is a long shot but could fix it.

 

Best wishes,

Jeff

 

 

 

 

AndrewHowell
Moderator

Thanks @JBailey bit that wasn't the issue.

 

I tried:

  • min(amount)
  • min(amount) min_date
  • min(amount) as min_date
  • all the same error.

BTW, the table is partitioned by year, month. And it's a managed table (not external).

 

JBailey
Barite | Level 11

Hi @AndrewHowell

 

I did a bit of Googling and found this...

 

"The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'."

 

It sounds like it might be what you are facing.

 

Best wishes,
Jeff

AndrewHowell
Moderator

Issue not solved but at least understood:

  • The Hive database in question is public - anyone with access to the cluster could access (and extract) data.
  • However, if aggregation was required, a Spark session would start, but would fail for any user other than the owner of the database.

Need to revise the security model..

BigRider
Obsidian | Level 7

Hi everyone,

 

I´m facing the some problem. If I understood the architecture of connector SAS interface to Hadoop, It connects to a hive server and after submit processing to a spark cluster.

 

I configured as this article, setting by default engine and redirecting to a spark cluster.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

 

But, I got exactly this error when I´m using some where clauses.

 

@AndrewHowell , did you suceeed to solve this issue ? If yes, how do you get success to run jobs on spark engine ? Can you provide your config or properties you adjusted to run ?

 

Thanks in advance,

 

Regardas

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4330 views
  • 0 likes
  • 3 in conversation