BookmarkSubscribeRSS Feed
AndrewHowell
Moderator

I know SAS, SQL, SASTRACE, etc very well, but I'm a newbie to Hive, trying to understand why extractions work, but summarisations generate errors. (Yes I appreciate Hive isn't actually an SQL database, just trying to figure why somethings work & some don't).

 

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:

       org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.

SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )

       group by year, month

 

Code & logs below - any assistance appreciated.

 

options SET=SAS_HADOOP_JAR_PATH="/sas94/thirdparty/hadoop/lib";
options sastrace=',,,dsat' sastraceloc=saslog symbolgen mprint mlogic;

options fullstimer;

proc sql stimer;
	connect using SAS_HIVE; 
	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, amount
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
	);

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount)as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

	disconnect from SAS_HIVE;
quit;


33         options fullstimer;
34         
35         proc sql stimer;
36         	connect using SAS_HIVE;

  241 1539345625 no_name 0 SQL (2) 
HADOOP_44: Executed: on connection 3 242 1539345625 no_name 0 SQL (2) 
USE `hive_sasfm` 243 1539345625 no_name 0 SQL (2) 
  244 1539345625 no_name 0 SQL (2) 
NOTE: SQL Statement used (Total process time):
      real time           0.41 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              1320.75k
      OS Memory           23788.00k
      
37         	select distinct * from connection to SAS_HIVE
38         	(
39         		SELECT year, month, amount
40         			FROM H_TABLE
41         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
42         	);

  245 1539345626 no_name 0 SQL (2) 
HADOOP_45: Prepared: on connection 3 246 1539345626 no_name 0 SQL (2) 
SELECT year, month, amount FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 247 1539345626 no_name 0 SQL (2) 
  248 1539345626 no_name 0 SQL (2) 
  249 1539345626 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 250 1539345626 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     0.111738 251 1539345626 no_name 0 SQL (2) 
Total SQL describe seconds were:                    0.000496 252 1539345626 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     0.139514 253 1539345626 no_name 0 SQL (2) 
  254 1539345626 no_name 0 SQL (2) 

NOTE: SQL Statement used (Total process time):
      real time           0.14 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              5260.46k
      OS Memory           27820.00k
      Timestamp           12/10/2018 11:00:26 PM

44         	select distinct * from connection to SAS_HIVE
45         	(
46         		SELECT year, month, min(amount)as min_date
47         			FROM H_TABLE
48         			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
49         			group by year, month
50         	);

  255 1539345626 no_name 0 SQL (2) 
HADOOP_46: Prepared: on connection 3 256 1539345626 no_name 0 SQL (2) 
SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) group by year, 
month 257 1539345626 no_name 0 SQL (2) 
  258 1539345626 no_name 0 SQL (2) 
  259 1539345629 no_name 0 SQL (2) 
Summary Statistics for HADOOP are: 260 1539345629 no_name 0 SQL (2) 
Total SQL prepare seconds were:                     3.649941 261 1539345629 no_name 0 SQL (2) 
Total seconds used by the HADOOP ACCESS engine were     3.650159 262 1539345629 no_name 0 SQL (2) 
  263 1539345629 no_name 0 SQL (2) 

ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session: 
       org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 
       group by year, month

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: SQL Statement used (Total process time):
      
52         	disconnect from SAS_HIVE;

NOTE: Statement not executed due to NOEXEC option.
NOTE: SQL Statement used (Total process time):
      
53         quit;
5 REPLIES 5
JBailey
Barite | Level 11

Hi @AndrewHowell

 

This may be a long shot but there should be space between the function call and the "as" keyword. Try changing "min(amount)as min_date" to "min(amount) as min_date".

 

Here is the code:

	select distinct * from connection to SAS_HIVE
	(
		SELECT year, month, min(amount) as min_date
			FROM H_TABLE
			WHERE  ( ( year = 2017 ) AND  ( month = 10 ) )
			group by year, month
	);

Like I said, it is a long shot but could fix it.

 

Best wishes,

Jeff

 

 

 

 

AndrewHowell
Moderator

Thanks @JBailey bit that wasn't the issue.

 

I tried:

  • min(amount)
  • min(amount) min_date
  • min(amount) as min_date
  • all the same error.

BTW, the table is partitioned by year, month. And it's a managed table (not external).

 

JBailey
Barite | Level 11

Hi @AndrewHowell

 

I did a bit of Googling and found this...

 

"The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'."

 

It sounds like it might be what you are facing.

 

Best wishes,
Jeff

AndrewHowell
Moderator

Issue not solved but at least understood:

  • The Hive database in question is public - anyone with access to the cluster could access (and extract) data.
  • However, if aggregation was required, a Spark session would start, but would fail for any user other than the owner of the database.

Need to revise the security model..

BigRider
Obsidian | Level 7

Hi everyone,

 

I´m facing the some problem. If I understood the architecture of connector SAS interface to Hadoop, It connects to a hive server and after submit processing to a spark cluster.

 

I configured as this article, setting by default engine and redirecting to a spark cluster.

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

 

But, I got exactly this error when I´m using some where clauses.

 

@AndrewHowell , did you suceeed to solve this issue ? If yes, how do you get success to run jobs on spark engine ? Can you provide your config or properties you adjusted to run ?

 

Thanks in advance,

 

Regardas

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3550 views
  • 0 likes
  • 3 in conversation