When we talk about Data Lake, Hadoop is synonymous with the medium of implementation. Data in Hadoop can be accessed from SAS using SAS/ACCESS to Hadoop and SAS/ACCESS to ODBC. SAS/ACCESS to Hadoop and SAS/ACCESS to ODBC, each have their own place in a data lake, but using the former has inherent advantages over the later, ODBC. With the introduction of SAS/ACCESS to Spark (comes with SAS/ACCESS to Hadoop), it is even more appealing to use SAS/ACCESS to Hadoop. Here are 10 benefits of using SAS/ACCESS to Hadoop vs SAS/ACCESS to ODBC.
ACCESS to ODBC only supports: DBCREATE_TABLE_OPTS=
ACCESS to Hadoop has many options that assist in creating tables, such as
The most important “secret” that you need to know about the SAS create table options is that they enable you to specify
text strings that are placed into CREATE TABLE and CREATE TABLE AS SELECT (CTAS) statements that SAS generates.
Each option is responsible for placing the text string in a different location in the SQL statement. POST_STMT_OPTS= is
interesting because its behavior changes base on whether DBIDIRECTEXEC has been enabled.
PROC SQOOP procedure enables users to access an Apache Sqoop utility from a SAS session to transfer data
between a database and HDFS. Using SAS PROC SQOOP lets you submit Sqoop commands from within your SAS
application to your Hadoop cluster.
PROC SQOOP is licensed with SAS/ACCESS® Interface for Hadoop, it’s not part of the Base SAS® license. PROC
SQOOP is supported in UNIX and Windows SAS.
Requirements for SAS In-Database Technologies for Spark
SAS In-Database Technologies for Spark requires SAS In-Database Technologies for Hadoop, which is separately licensed.
The SAS Embedded Process for Spark is included with SAS In-Database Technologies for Spark. The SAS Embedded
Process must also be installed on the Spark cluster to enable SAS Data Connect Accelerator for Spark. SAS recommends
installing the latest version of the SAS Embedded Process.
For information about supported versions and requirements: Click here
In both cases, YARN Resource Manager UI can be used to examine how the job was run within the Spark execution environment if detailed information is needed. The SAS/CAS logs will show information about the results of the job.
In-Database Technologies for Spark is available in Viya 3.4 as a limited availability option.
Essentially SAS/ACCESS to ODBC connection has generic data access support. There are no Hadoop specific capabilities added or optimized.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.