SAS expertise delivered to your desktop -- on-demand and free!

Join Now

Programming Techniques for SAS In-Memory Analytics for Hadoop

by SAS Employee DavidGhan on ‎07-06-2017 04:32 PM - edited yesterday by SAS Employee john_bauman (1,489 Views)

If you missed the Ask the Expert session on Programming Techniques for SAS In-Memory Analytics for Hadoop you can still view it on-demand at any time.

 

Watch the webinar

 

I've also attached the slides and the program code that was demonstrated.

This session

  • overviews several SAS technologies and programming methods for hadoop
  • describes the architectures that support the SAS high-performance grid and the SAS LASR grid
  • demonstrates a SAS program example that creates a SASHDAT data set in hadoop,uses PROC LASR to start a SAS LASR server and load data into memory, and explores and analyses the in-memory data using PROC IMSTAT     

 In Memory Analytics.png

Here is a transcript of the Q&A segment held at the end of the session for ease of reference.

 

Can I use programming with PROC IMSTAT to access data in the same LASR server used for Visual Analytics?

 

Yes. If you know the IP address or name of the server machine where the Visual Analytics LASR root node is running, and if you know the port you can use PROC IMSTAT to access the LASR server used for Visual Analytics. You will also need to have user permissions in the SAS Metadata environment to access the LASR server and the in-memory tables on the LASR server.

 

Do I have to run the LASR server in all the machines of the Hadoop cluster?

 

No. When you start up the LASR server you can use the nodes= option to specify the number of nodes you want to use to run the LASR server. This may be less than the total number of available nodes and this is something you may choose if you know that the size of your data or amount of processing needed does not require that all nodes are used.

 

Does LASR require Hadoop?

 

No. Distributed LASR is also supported for a Teradata environment and for EMC Greenplum as well.

 

Can I load Hive tables into LASR?

 

If you also have a license for the Code Accelerator for Hadoop you can also load Hive tables in parallel into the LASR server.

 

Much of our work is to process lots of data, e.g., to create monthly data sets and reports.   Is SAS in-memory processing, relevant to such activities which would not be considered to be "analytics"?

 

Yes, several statements in PROC IMSTAT are useful for generating various summary reports and other metrics describing the distribution of data values. You can also generate detailed reports, perhaps for specific subsets of your data if the data sources have a large number of rows or columns. Any results generated with PROC IMSTAT can be stored as datasets on the SAS server or in LASR memory, or saved as distributed data in HDFS.

 

 

Recommended Resources

Course: DS2 Programming Essentials with Hadoop

 

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Contributors