Solved: Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can som...

Cherry · Posted 10-22-2019 08:27 AM

How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

1.How it works( how sas and hadoop works and share data, processing)

2.Configuration files that involves

3.Where are the changes required? integration changes?

Thanks,

Cherry.

ronan · Posted 10-28-2019 12:40 PM

There are several ways to connect SAS with Hadoop generally speaking, apart from specific MPP engines like Impala or HAWQ running on Hadoop.

as regards SAS 9 : the specific module SAS/ACCESS to Hadoop offers two levels of integration

1) APIs that translate SAS general purpose commands into Hadoop commands (HiveQL, HDFS or Oozie mainly ) and convert data between SAS and Hive or HDFS. SAS Data step, Proc SQL queries and some data preparation Procedures (Freq, Means, Rank, Sort, Summary, Tabulate, Report + Transpose) . If we compare with other SAS Access connectors, SAS/ACCESS to Hadoop is more extensive : in addition to HiveQL requests, it can also natively submits HDFS (Proc Hadoop) or Sqoop commands (Proc Sqoop) .

Of course, this could be extended with Hadoop RestFul APIs and some SAS custom launchers (Proc Http) if necessary.

2) APIs that internally create hadoop jobs (MapReduce, Hive on Tez even Hive on Spark) and leverage Hadoop data on a massive, distributed, scale using SAS Embedded Process : SAS Data Step extension DS2, SAS in-database analytics products like High Performance Procedures . In this configuration, Hive can be bypassed and replaced by SAS own (HDFS) XML file descriptors (Proc HDMD).

as regards SAS Viya (I am less familiar), the same applies equally : translation vs internal manipulation (SAS EP). It also adds its own features like storing , & retrieving as well, CAS (=Viya) tables in their native format (SASHDAT) inside Hadoop as mere HDFS files.

The configuration depends on the products installed : SAS server side only for 1), additional installation & configuration steps on hadoop nodes for 2).

Even if this is a bit old, see also :

https://www.sas.com/content/dam/SAS/en_gb/doc/presentations/user-groups/working-with-sas-hadoop.pdf

View solution in original post

alexal · Posted 10-22-2019 08:45 AM

@Cherry ,

Let me clarify something, are you talking about SAS/ACCESS Interface to Hadoop or Co-located HDFS?

Cherry · Posted 10-22-2019 09:15 AM

I wanted to know both? what are difference between them?

alexal · Posted 10-22-2019 09:20 AM

@Cherry ,

Please check these pages:

ronan · Posted 10-28-2019 12:40 PM

There are several ways to connect SAS with Hadoop generally speaking, apart from specific MPP engines like Impala or HAWQ running on Hadoop.

as regards SAS 9 : the specific module SAS/ACCESS to Hadoop offers two levels of integration

1) APIs that translate SAS general purpose commands into Hadoop commands (HiveQL, HDFS or Oozie mainly ) and convert data between SAS and Hive or HDFS. SAS Data step, Proc SQL queries and some data preparation Procedures (Freq, Means, Rank, Sort, Summary, Tabulate, Report + Transpose) . If we compare with other SAS Access connectors, SAS/ACCESS to Hadoop is more extensive : in addition to HiveQL requests, it can also natively submits HDFS (Proc Hadoop) or Sqoop commands (Proc Sqoop) .

Of course, this could be extended with Hadoop RestFul APIs and some SAS custom launchers (Proc Http) if necessary.

2) APIs that internally create hadoop jobs (MapReduce, Hive on Tez even Hive on Spark) and leverage Hadoop data on a massive, distributed, scale using SAS Embedded Process : SAS Data Step extension DS2, SAS in-database analytics products like High Performance Procedures . In this configuration, Hive can be bypassed and replaced by SAS own (HDFS) XML file descriptors (Proc HDMD).

as regards SAS Viya (I am less familiar), the same applies equally : translation vs internal manipulation (SAS EP). It also adds its own features like storing , & retrieving as well, CAS (=Viya) tables in their native format (SASHDAT) inside Hadoop as mere HDFS files.

The configuration depends on the products installed : SAS server side only for 1), additional installation & configuration steps on hadoop nodes for 2).

Even if this is a bit old, see also :

https://www.sas.com/content/dam/SAS/en_gb/doc/presentations/user-groups/working-with-sas-hadoop.pdf

How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?

Re: How to integrate SAS with Hadoop ( or SAS VA with Hadoop)? Can some explain from scratch?