Hi,
I have to work with SAS dataset and SAS proceses in a Hadoop environment, I have some question.
- If I want to use Hive from libname I need SAS ACCESS TO HADOOP license, is correct?
- If I want to use a filename sentence to HDFS is not necesary any license isn't it?
- What licence do I need to use proc hadoop?, and proc ds2?
- Las question: is there anything similar to work in Spark environment?
If there is a more apropiated forum for this question, let me know.
Thanks in advance,
Juan
If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2 and use embedded process is your only option.
For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.
Other procedure that implicit creates pass through SQL:
FREQ |
MEANS |
RANK [Hadoop with Hive .13 and later] |
REPORT |
SORT [Hadoop with Hive .13 and later] |
SUMMARY |
TABULATE |
TRANSPOSE [Hadoop and Teradata only] |
You probably find this matrix useful:
Hive requires ACCESS to Hadoop, yes.
Filename HDFS is covered by Base SAS, yes.
PROC HADOOP also in Base SAS.
PROC DS2 runs in Base SAS, but if you want it to execute within Hadoop, you need the embedded process, and add on to ACCESS if I recall right.
Spark can be utilized in Data Loader for Hadoop. This only mentioned once at support.sas.com, so I presume it's in its prime.
Thank you for your help Linush. Very useful
I would like yo know which is the best way to execute a sas program in Hadoop If I want to take advantage of parallel
execution of Hadoop. For example a sas program who makes somo data step and procs with data on Hadoop, I want
to benefit of Hadoop cluster paralelization, which is the best way??
Thanks
If you have a data step that can't be rewritten to SQL, I think currently rewriting it to PROC DS2 and use embedded process is your only option.
For SQL (given you data is stored in Hive or other relational format), you can make use of implicit or explicit SQL pass through.
Other procedure that implicit creates pass through SQL:
FREQ |
MEANS |
RANK [Hadoop with Hive .13 and later] |
REPORT |
SORT [Hadoop with Hive .13 and later] |
SUMMARY |
TABULATE |
TRANSPOSE [Hadoop and Teradata only] |
You probably find this matrix useful:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.