07-20-2015 08:58 AM
I'm excited by what I see in the What's New for SAS V9.4 M3. In particular I'm excited by the idea of running a SAS Grid inside of Hadoop. The documentation (Grid Computing in SAS(R) 9.4, Fourth Edition) makes no mention of any limitations on which PROCs and DATA Step functions can be used. Do we now have a full copy of SAS inside of Hadoop?
Hitherto, the use of SAS inside of Hadoop was limited in a similar fashion to in-database accelerators, i.e. DS2 with a sub-set of functions, and HPxxxx procedures. Is this no longer the case?
Secondly, I see no reference to the use of a clustered file system. Am I right to assume that I no longer have to grapple with clustered file systems and that SAS jobs that are executed within Hadoop by Grid Manager will get their data from HDFS?
Thanks in anticipation of further information...
08-07-2015 03:20 PM
I don't know all of the details regarding data access, but I believe you should be able to access data as normal by using a HADOOP LIBNAME or SPDE LIBNAME statement.
About file systems - you'll need a shared file system for SASGSUB checkpoint restart capability, and if you're sharing to all nodes as part of SAS installation and configuration. Data is assumed to be on HDFS, so it doesn't need to be on the shared file system.
You might want to take a look at the new document "Configuring the Hadoop Cluster for Use by SAS® Grid Manager for Hadoop" for more information - http://support.sas.com/rnd/scalability/grid/hadoop/SGMforHadoop-ConfiguringHadoop.pdf