I know SAS, SQL, SASTRACE, etc very well, but I'm a newbie to Hive, trying to understand why extractions work, but summarisations generate errors. (Yes I appreciate Hive isn't actually an SQL database, just trying to figure why somethings work & some don't).
ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
Code & logs below - any assistance appreciated.
options SET=SAS_HADOOP_JAR_PATH="/sas94/thirdparty/hadoop/lib";
options sastrace=',,,dsat' sastraceloc=saslog symbolgen mprint mlogic;
options fullstimer;
proc sql stimer;
connect using SAS_HIVE;
select distinct * from connection to SAS_HIVE
(
SELECT year, month, amount
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
);
select distinct * from connection to SAS_HIVE
(
SELECT year, month, min(amount)as min_date
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
);
disconnect from SAS_HIVE;
quit;
33 options fullstimer;
34
35 proc sql stimer;
36 connect using SAS_HIVE;
241 1539345625 no_name 0 SQL (2)
HADOOP_44: Executed: on connection 3 242 1539345625 no_name 0 SQL (2)
USE `hive_sasfm` 243 1539345625 no_name 0 SQL (2)
244 1539345625 no_name 0 SQL (2)
NOTE: SQL Statement used (Total process time):
real time 0.41 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1320.75k
OS Memory 23788.00k
37 select distinct * from connection to SAS_HIVE
38 (
39 SELECT year, month, amount
40 FROM H_TABLE
41 WHERE ( ( year = 2017 ) AND ( month = 10 ) )
42 );
245 1539345626 no_name 0 SQL (2)
HADOOP_45: Prepared: on connection 3 246 1539345626 no_name 0 SQL (2)
SELECT year, month, amount FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 247 1539345626 no_name 0 SQL (2)
248 1539345626 no_name 0 SQL (2)
249 1539345626 no_name 0 SQL (2)
Summary Statistics for HADOOP are: 250 1539345626 no_name 0 SQL (2)
Total SQL prepare seconds were: 0.111738 251 1539345626 no_name 0 SQL (2)
Total SQL describe seconds were: 0.000496 252 1539345626 no_name 0 SQL (2)
Total seconds used by the HADOOP ACCESS engine were 0.139514 253 1539345626 no_name 0 SQL (2)
254 1539345626 no_name 0 SQL (2)
NOTE: SQL Statement used (Total process time):
real time 0.14 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5260.46k
OS Memory 27820.00k
Timestamp 12/10/2018 11:00:26 PM
44 select distinct * from connection to SAS_HIVE
45 (
46 SELECT year, month, min(amount)as min_date
47 FROM H_TABLE
48 WHERE ( ( year = 2017 ) AND ( month = 10 ) )
49 group by year, month
50 );
255 1539345626 no_name 0 SQL (2)
HADOOP_46: Prepared: on connection 3 256 1539345626 no_name 0 SQL (2)
SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) group by year,
month 257 1539345626 no_name 0 SQL (2)
258 1539345626 no_name 0 SQL (2)
259 1539345629 no_name 0 SQL (2)
Summary Statistics for HADOOP are: 260 1539345629 no_name 0 SQL (2)
Total SQL prepare seconds were: 3.649941 261 1539345629 no_name 0 SQL (2)
Total seconds used by the HADOOP ACCESS engine were 3.650159 262 1539345629 no_name 0 SQL (2)
263 1539345629 no_name 0 SQL (2)
ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: SQL Statement used (Total process time):
52 disconnect from SAS_HIVE;
NOTE: Statement not executed due to NOEXEC option.
NOTE: SQL Statement used (Total process time):
53 quit;
This may be a long shot but there should be space between the function call and the "as" keyword. Try changing "min(amount)as min_date" to "min(amount) as min_date".
Here is the code:
select distinct * from connection to SAS_HIVE
(
SELECT year, month, min(amount) as min_date
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
);
Like I said, it is a long shot but could fix it.
Best wishes,
Jeff
Thanks @JBailey bit that wasn't the issue.
I tried:
BTW, the table is partitioned by year, month. And it's a managed table (not external).
I did a bit of Googling and found this...
"The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'."
It sounds like it might be what you are facing.
Best wishes,
Jeff
Issue not solved but at least understood:
Need to revise the security model..
Hi everyone,
I´m facing the some problem. If I understood the architecture of connector SAS interface to Hadoop, It connects to a hive server and after submit processing to a spark cluster.
I configured as this article, setting by default engine and redirecting to a spark cluster.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
But, I got exactly this error when I´m using some where clauses.
@AndrewHowell , did you suceeed to solve this issue ? If yes, how do you get success to run jobs on spark engine ? Can you provide your config or properties you adjusted to run ?
Thanks in advance,
Regardas
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.