I know SAS, SQL, SASTRACE, etc very well, but I'm a newbie to Hive, trying to understand why extractions work, but summarisations generate errors. (Yes I appreciate Hive isn't actually an SQL database, just trying to figure why somethings work & some don't).
ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
Code & logs below - any assistance appreciated.
options SET=SAS_HADOOP_JAR_PATH="/sas94/thirdparty/hadoop/lib";
options sastrace=',,,dsat' sastraceloc=saslog symbolgen mprint mlogic;
options fullstimer;
proc sql stimer;
connect using SAS_HIVE;
select distinct * from connection to SAS_HIVE
(
SELECT year, month, amount
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
);
select distinct * from connection to SAS_HIVE
(
SELECT year, month, min(amount)as min_date
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
);
disconnect from SAS_HIVE;
quit;
33 options fullstimer;
34
35 proc sql stimer;
36 connect using SAS_HIVE;
241 1539345625 no_name 0 SQL (2)
HADOOP_44: Executed: on connection 3 242 1539345625 no_name 0 SQL (2)
USE `hive_sasfm` 243 1539345625 no_name 0 SQL (2)
244 1539345625 no_name 0 SQL (2)
NOTE: SQL Statement used (Total process time):
real time 0.41 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1320.75k
OS Memory 23788.00k
37 select distinct * from connection to SAS_HIVE
38 (
39 SELECT year, month, amount
40 FROM H_TABLE
41 WHERE ( ( year = 2017 ) AND ( month = 10 ) )
42 );
245 1539345626 no_name 0 SQL (2)
HADOOP_45: Prepared: on connection 3 246 1539345626 no_name 0 SQL (2)
SELECT year, month, amount FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) 247 1539345626 no_name 0 SQL (2)
248 1539345626 no_name 0 SQL (2)
249 1539345626 no_name 0 SQL (2)
Summary Statistics for HADOOP are: 250 1539345626 no_name 0 SQL (2)
Total SQL prepare seconds were: 0.111738 251 1539345626 no_name 0 SQL (2)
Total SQL describe seconds were: 0.000496 252 1539345626 no_name 0 SQL (2)
Total seconds used by the HADOOP ACCESS engine were 0.139514 253 1539345626 no_name 0 SQL (2)
254 1539345626 no_name 0 SQL (2)
NOTE: SQL Statement used (Total process time):
real time 0.14 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5260.46k
OS Memory 27820.00k
Timestamp 12/10/2018 11:00:26 PM
44 select distinct * from connection to SAS_HIVE
45 (
46 SELECT year, month, min(amount)as min_date
47 FROM H_TABLE
48 WHERE ( ( year = 2017 ) AND ( month = 10 ) )
49 group by year, month
50 );
255 1539345626 no_name 0 SQL (2)
HADOOP_46: Prepared: on connection 3 256 1539345626 no_name 0 SQL (2)
SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) ) group by year,
month 257 1539345626 no_name 0 SQL (2)
258 1539345626 no_name 0 SQL (2)
259 1539345629 no_name 0 SQL (2)
Summary Statistics for HADOOP are: 260 1539345629 no_name 0 SQL (2)
Total SQL prepare seconds were: 3.649941 261 1539345629 no_name 0 SQL (2)
Total seconds used by the HADOOP ACCESS engine were 3.650159 262 1539345629 no_name 0 SQL (2)
263 1539345629 no_name 0 SQL (2)
ERROR: Prepare error: Error while compiling statement: FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
SQL statement: SELECT year, month, min(amount)as min_date FROM H_TABLE WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: SQL Statement used (Total process time):
52 disconnect from SAS_HIVE;
NOTE: Statement not executed due to NOEXEC option.
NOTE: SQL Statement used (Total process time):
53 quit;
This may be a long shot but there should be space between the function call and the "as" keyword. Try changing "min(amount)as min_date" to "min(amount) as min_date".
Here is the code:
select distinct * from connection to SAS_HIVE
(
SELECT year, month, min(amount) as min_date
FROM H_TABLE
WHERE ( ( year = 2017 ) AND ( month = 10 ) )
group by year, month
);
Like I said, it is a long shot but could fix it.
Best wishes,
Jeff
Thanks @JBailey bit that wasn't the issue.
I tried:
BTW, the table is partitioned by year, month. And it's a managed table (not external).
I did a bit of Googling and found this...
"The YARN Container Memory was smaller than the Spark Executor requirement. I set the YARN Container memory and maximum to be greater than Spark Executor Memory + Overhead. Check 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'."
It sounds like it might be what you are facing.
Best wishes,
Jeff
Issue not solved but at least understood:
Need to revise the security model..
Hi everyone,
I´m facing the some problem. If I understood the architecture of connector SAS interface to Hadoop, It connects to a hive server and after submit processing to a spark cluster.
I configured as this article, setting by default engine and redirecting to a spark cluster.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
But, I got exactly this error when I´m using some where clauses.
@AndrewHowell , did you suceeed to solve this issue ? If yes, how do you get success to run jobs on spark engine ? Can you provide your config or properties you adjusted to run ?
Thanks in advance,
Regardas
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.