DATA Step, Macro, Functions and more

Connecting SAS to a Hadoop based data lake

Reply
Frequent Contributor
Posts: 137

Connecting SAS to a Hadoop based data lake

 

Hi,

 

I am working with a SAS Base in a server and I need to connect to a Hadoop based data lake to get some data from HDFS.

I am using SAS/ACCESS to Hadoop and using a libname to Hive, then I run a query in Hadoop. This type of connection works but It's very slow,.It takes more than an hour tu run a Hive query that get 10.000 rows from a 10 millions rows file.

 

Is there any other way to ge data from Hadoop in SAS?, can I connect to Impala vía libname? in the data lake I have algo Spark installed.

 

Any advice will be greatly appreciated

 

Thanks in advance

PROC Star
Posts: 2,319

Re: Connecting SAS to a Hadoop based data lake

Posted in reply to juanvg1972

As I recall, Hive is very slow for querying and is best used to write data, while Impala is best used to query data.

Frequent Contributor
Posts: 137

Re: Connecting SAS to a Hadoop based data lake

Yes, I know it, but how can I connect Impala from SAS?

 

Thanks

Frequent Contributor
Posts: 137

Re: Connecting SAS to a Hadoop based data lake

Posted in reply to juanvg1972

I have found this:

 

https://www.sas.com/en_us/software/access-interface-impala.html

 

I have other question:

 

Is there any way of execute SAS code (data steps, proc sql and statistycal procs) in a Hadoop in a distrbuted way?

Is there any module to do this?. If I transforma this code to proc ds2 o proc fedsql it will be possible?

 

Thanks in advance

PROC Star
Posts: 2,319

Re: Connecting SAS to a Hadoop based data lake

Posted in reply to juanvg1972

 

1.  SAS/ACCESS or ODBC ( https://www.cloudera.com/downloads/connectors/impala/odbc/2-5-37.html ) allow you to use Impala from SAS

 

2. You need SAS Embedded Process to execute SAS code within the Hadoop cluster.

Otherwise you are limited to SQL-type queries.

 

Ask a Question
Discussion stats
  • 4 replies
  • 89 views
  • 1 like
  • 2 in conversation