Does anyone know of any users that install\configure Hadoop, Hive, and Pig directly from Apache and then successfully connect the Hadoop SAS\Access engine to these? Can SAS connect to a stand-alone Hadoop cluster using proc hadoop? It seems SAS\Access for Hadoop requires Hive, but is that it?
Thanks - So, if I have an existing "home-brew" Hadoop cluster with Hive and Pig installed, then it's possible that the SAS\ACCESS engine will be able to communicate? From what I understand, Hive is required in order to use the LIBNAME statement, but will PROC HDFS work without Hive? Has anyone out there built their own cluster and then gotten SAS to connect?
PROC HADOOP is part of base SAS and works now from PC. I created a Hadoop Cluster in UBUNTU 16 LTS and ran the latest hadooptracer.py script from SAS. I then copied over the config files to my PC. Note that the output configs and jars are for a single node pseudo distributed mode configuration. The config files contain the JARS and the XMLs SAS needs to talk to the Hadoop cluster. I placed these on my PC and ran the following program (configs are at \\MY NETWORK SHARE\Hadoop). Note that "Configured Hadoop User" has the .bashrc configured for JAVA and HADOOP on the UBUNTU cluster...
options SET = SAS_HADOOP_JAR_PATH " \\MY NETWORK SHARE\Hadoop\lib"; options SET = SAS_HADOOP_CONFIG_PATH " \\MY NETWORK SHARE\Hadoop\conf";
proc hadoop username='Configured Hadoop User on UBUNTU' password='user password'; hdfs mkdir="/user/new"; run;
If you go to http://YOURCLUSTER FULL ADDRESS:50070/dfshealth.html#tab-overview > Utilities > Browse the file system > look under “user” and you will see the new HDFS directory. So, it appears to be possible to connect sas to a home-brew cluster.
I also got my server to connect via PROC HDFS (code is very similar)...I'd recommend the above as a troubleshooting step to see if there are configuration issues with the cluster or sas user permission issues...I'm gonna try to connect the access engine to hive on the cluster (I also installed Hive) and I'll post any results