Hi,
I am working with a SAS Base in a server and I need to connect to a Hadoop based data lake to get some data from HDFS.
I am using SAS/ACCESS to Hadoop and using a libname to Hive, then I run a query in Hadoop. This type of connection works but It's very slow,.It takes more than an hour tu run a Hive query that get 10.000 rows from a 10 millions rows file.
Is there any other way to ge data from Hadoop in SAS?, can I connect to Impala vía libname? in the data lake I have algo Spark installed.
Any advice will be greatly appreciated
Thanks in advance
As I recall, Hive is very slow for querying and is best used to write data, while Impala is best used to query data.
Yes, I know it, but how can I connect Impala from SAS?
Thanks
I have found this:
https://www.sas.com/en_us/software/access-interface-impala.html
I have other question:
Is there any way of execute SAS code (data steps, proc sql and statistycal procs) in a Hadoop in a distrbuted way?
Is there any module to do this?. If I transforma this code to proc ds2 o proc fedsql it will be possible?
Thanks in advance
1. SAS/ACCESS or ODBC ( https://www.cloudera.com/downloads/connectors/impala/odbc/2-5-37.html ) allow you to use Impala from SAS
2. You need SAS Embedded Process to execute SAS code within the Hadoop cluster.
Otherwise you are limited to SQL-type queries.
Hello,
Did you find a way to connect to Impala without using SAS Access to Impala, using the Hadoop engine ?
Regards,
SS
Thank you Chris for confirming
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.