06-07-2016 12:27 AM
I'm new to SAS and Cloudera(Hadoop).
I want to configure SAS so it can to connect to Cloudera(Hadoop).
I want to use the SAS Deployment manager for configuration.
Hadoop is the public domain "big data" solution. I'm using the Cloudera distribution of Hadoop.
My hadoop nodes are b1, b2, b3. Cloudera manager is on node b2.
The SAS client is udw.
I'm told this process involves:
1. Install of the SAS embedded process into all Hadoop nodes (name and data nodes)
2. Install of the Hadoop jar(java archives;java programs) onto the SAS client
Is this true?
Would I open the firewall port 7180 & 7183 between udw(SAS client) and b2? Would there be a need to open other port(s)?
06-09-2016 04:05 AM
To connect SAS/ACCESS to your Hadoop Cloudera cluster, your SAS Server will need a set of JAR files and a few configuration files (the hadoop client files, like core-site.xml, mapreduce.xml).
During the deployment you can use the SAS Deployment Manager to collect those files (see p.18 in http://support.sas.com/resources/thirdpartysupport/v94/hadoop/hadoopbacg.pdf ).
As explained in the document there are some requirements, one of them is to be able to contact the Hadoop Cluster manager. So yes you will need to open the port 7180 on the machine hosting the Cloudera Manager.
If you plan to only use SAS/ACCESS to HADOOP you don't need to deploy the SAS Embedded process.
The SAS Embedded process are licensed separatly and are required for products such as SAS Scoring Accelerator, SAS Coding Accelerator, SAS DQ Accleretor or some use cases like assymetric HPA, LASR parralle lift from Hadoop, etc...
Note : if, for some reasons the SDM fails to collect the files, you still can use the manual method (download the appropriate JAR and configuration files from your Hadoop cluster to the SAS server. In the Annex of teh document a sample list of JAR files are provided)