Working with SAS in Hadoop and Spark

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

Working with SAS in Hadoop and Spark

Hi,

 

I have to work with SAS dataset and SAS proceses in a Hadoop environment, I have some question.

 

- If I want to use Hive from libname I need SAS ACCESS TO HADOOP license, is correct?

- If I want to use a filename sentence to HDFS is not necesary any license isn't it?

- What licence do I need to use proc hadoop?, and proc ds2?

- Las question: is there anything similar to work in Spark environment?

 

If there is a more apropiated forum for this question, let me know.

 

Thanks in advance,

 

Juan


Accepted Solutions
Solution
‎06-02-2017 02:57 AM
Esteemed Advisor
Posts: 5,198

Re: Working with SAS in Hadoop and Spark

If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2  and use embedded process is your only option.

For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.

Other procedure that implicit creates pass through SQL:

FREQ
MEANS
RANK [Hadoop with Hive .13 and later]
REPORT
SORT [Hadoop with Hive .13 and later]
SUMMARY
TABULATE
TRANSPOSE [Hadoop and Teradata only]

You probably find this matrix useful:

http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#p13td0l6w0329rn15u...

Data never sleeps

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,198

Re: Working with SAS in Hadoop and Spark

Hive requires ACCESS to Hadoop, yes.

Filename HDFS is covered by Base SAS, yes.

PROC HADOOP also in Base SAS.

PROC DS2 runs in Base SAS, but if you want it to execute within Hadoop, you need the embedded process, and add on to ACCESS if I recall right.

Spark can be utilized in Data Loader for Hadoop. This only mentioned once at support.sas.com, so I presume it's in its prime.

Data never sleeps
Frequent Contributor
Posts: 90

Re: Working with SAS in Hadoop and Spark

Thank you for your help Linush. Very useful

 

I would like yo know which is the best way to execute a sas program in Hadoop If I want to take advantage of parallel

execution of Hadoop. For example a sas program who makes somo data step and procs with data on Hadoop, I want

to benefit of Hadoop cluster paralelization, which is the best way??

 

Thanks

Solution
‎06-02-2017 02:57 AM
Esteemed Advisor
Posts: 5,198

Re: Working with SAS in Hadoop and Spark

If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2  and use embedded process is your only option.

For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.

Other procedure that implicit creates pass through SQL:

FREQ
MEANS
RANK [Hadoop with Hive .13 and later]
REPORT
SORT [Hadoop with Hive .13 and later]
SUMMARY
TABULATE
TRANSPOSE [Hadoop and Teradata only]

You probably find this matrix useful:

http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#p13td0l6w0329rn15u...

Data never sleeps
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 111 views
  • 1 like
  • 2 in conversation