How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?
1.How it works( how sas and hadoop works and share data, processing)
2.Configuration files that involves
3.Where are the changes required? integration changes?
Thanks,
Cherry.
There are several ways to connect SAS with Hadoop generally speaking, apart from specific MPP engines like Impala or HAWQ running on Hadoop.
as regards SAS 9 : the specific module SAS/ACCESS to Hadoop offers two levels of integration
1) APIs that translate SAS general purpose commands into Hadoop commands (HiveQL, HDFS or Oozie mainly ) and convert data between SAS and Hive or HDFS. SAS Data step, Proc SQL queries and some data preparation Procedures (Freq, Means, Rank, Sort, Summary, Tabulate, Report + Transpose) . If we compare with other SAS Access connectors, SAS/ACCESS to Hadoop is more extensive : in addition to HiveQL requests, it can also natively submits HDFS (Proc Hadoop) or Sqoop commands (Proc Sqoop) .
Of course, this could be extended with Hadoop RestFul APIs and some SAS custom launchers (Proc Http) if necessary.
2) APIs that internally create hadoop jobs (MapReduce, Hive on Tez even Hive on Spark) and leverage Hadoop data on a massive, distributed, scale using SAS Embedded Process : SAS Data Step extension DS2, SAS in-database analytics products like High Performance Procedures . In this configuration, Hive can be bypassed and replaced by SAS own (HDFS) XML file descriptors (Proc HDMD).
as regards SAS Viya (I am less familiar), the same applies equally : translation vs internal manipulation (SAS EP). It also adds its own features like storing , & retrieving as well, CAS (=Viya) tables in their native format (SASHDAT) inside Hadoop as mere HDFS files.
The configuration depends on the products installed : SAS server side only for 1), additional installation & configuration steps on hadoop nodes for 2).
Even if this is a bit old, see also :
https://www.sas.com/content/dam/SAS/en_gb/doc/presentations/user-groups/working-with-sas-hadoop.pdf
@Cherry ,
Let me clarify something, are you talking about SAS/ACCESS Interface to Hadoop or Co-located HDFS?
I wanted to know both? what are difference between them?
There are several ways to connect SAS with Hadoop generally speaking, apart from specific MPP engines like Impala or HAWQ running on Hadoop.
as regards SAS 9 : the specific module SAS/ACCESS to Hadoop offers two levels of integration
1) APIs that translate SAS general purpose commands into Hadoop commands (HiveQL, HDFS or Oozie mainly ) and convert data between SAS and Hive or HDFS. SAS Data step, Proc SQL queries and some data preparation Procedures (Freq, Means, Rank, Sort, Summary, Tabulate, Report + Transpose) . If we compare with other SAS Access connectors, SAS/ACCESS to Hadoop is more extensive : in addition to HiveQL requests, it can also natively submits HDFS (Proc Hadoop) or Sqoop commands (Proc Sqoop) .
Of course, this could be extended with Hadoop RestFul APIs and some SAS custom launchers (Proc Http) if necessary.
2) APIs that internally create hadoop jobs (MapReduce, Hive on Tez even Hive on Spark) and leverage Hadoop data on a massive, distributed, scale using SAS Embedded Process : SAS Data Step extension DS2, SAS in-database analytics products like High Performance Procedures . In this configuration, Hive can be bypassed and replaced by SAS own (HDFS) XML file descriptors (Proc HDMD).
as regards SAS Viya (I am less familiar), the same applies equally : translation vs internal manipulation (SAS EP). It also adds its own features like storing , & retrieving as well, CAS (=Viya) tables in their native format (SASHDAT) inside Hadoop as mere HDFS files.
The configuration depends on the products installed : SAS server side only for 1), additional installation & configuration steps on hadoop nodes for 2).
Even if this is a bit old, see also :
https://www.sas.com/content/dam/SAS/en_gb/doc/presentations/user-groups/working-with-sas-hadoop.pdf
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.