It doesn’t seem that long ago the SAS In-Memory Statistics(PROC IMSTAT and LASR) were the latest and greatest. SAS Visual Data Mining and Machine Learning(VDMML) on SAS Viya ( with CAS) is aimed at the same audience and has some differences and some similarities with PROC IMSTAT and LASR. This article picks out a few new concepts and things that are different from SAS 9.4.
In SAS VIYA the analytic engine is the Cloud Analytic Services server (CAS) and not LASR. Unlike LASR the CAS server can be easily integrated into server startup and in most cases will be up and running waiting for the end-user. The architecture for VDMML is simple, SAS Studio is the client, SAS studio has a SAS workspace server that it uses, the workspace server submits code which can process code just like 9.4, but can also connect to a Cloud Analytic Services (CAS) server. SAS Studio knows about the CAS server because of these options in the workspace server SAS autoexec file.
The first thing you need (and always need) when interacting with a CAS server is a CAS session. The session is created on the CAS server. You use the CAS statement to start a session and to connect to the SAS Cloud Analytic Services server. When you initially connect to SAS Cloud Analytic Services, your session is started on the server. Data access and communication is performed through the session.
Your programs communicate with the session to request actions. Many sessions can operate concurrently, actions execute serially within a session. In most cases, programmers start and use one session only. The advantage of sessions is that they retain each user’s identity, isolate each user’s activity, can run concurrently with other sessions and allow resource tracking. This code in SAS Studio starts a session called firstsession.
Now all the interactions with CAS will go through the session. If I logon to the CAS Server monitor I will see the session.
To interact with data in CAS you need a CAS library(caslib).
CAS libraries (caslibs) are the mechanism for accessing data within SAS Cloud Analytic Services (CAS). At its simplest, a caslib provides access to files in a data source, such as a database, hdfs or a simple file system directory, and is a container for in-memory tables. This is illustrated in the diagram below. For a brief explanation of CASLIB's you can view this video.
Caslibs also provide a way to apply access controls to data. Caslibs can have different scopes, either session or global. Session-scope caslibs make data available to the session that added the caslib. By default, when you add a caslib with the caslib statement, the caslib is session-scope. Global-scope caslibs make data available to multiple sessions on the server. Global-scope caslibs are useful for data sources that all programmers need to access or in cases when you want to share data with other users. An administrator can restrict your ability to add a global-scope caslib
When you start a session your personal caslib is automatically allocated and becomes the active caslib. More on what the active caslib is in a second. For example after the session is established if you execute:
The log shows that the caslib CASUSER is the active caslib and it points at the users home directory.
What does the Active caslib mean? The active caslib will be used for all CAS operations unless it is specifically overridden by options at the time of processing. For the session the active caslib is the default location for server-side data access. The term "active caslib" is used rather than default caslib because the caslib that your session uses can be changed during the session. The code below uses the caslib statement to allocate a global caslib for a directory on the file system. The caslib is called global_lib
As you can see from the log, when the caslib statement is executed the new caslib automatically becomes the active caslib. Now that we have a caslib we can access data or load data to the caslib.
So, if I have a caslib can I use it in PROC DATASETS or PROC CONTENTS and use it in a SET statement? No I can't. In order to access the data in CAS from existing SAS procedures, and the DATA step, you need a SAS library allocated using the CAS libname engine.
In this statement libname mylib with the cas engine points to the caslib global_lib. Now I have a library mylib that can be used within the SAS session just like any SAS library.
Using SAS DATA step you can load data to the caslib, the data is loaded from the local SAS session(workspace server) into memory in CAS and available for processing. This is similar to loading data via the LASR libname in SAS 9.4.
You can also load data with the new procedure PROC CASUTIL. The procedure can load and drop tables, save tables and provide information about tables. This is similar to the LASR procedure in SAS 9.4.
In this example sashelp.cars is loaded into memory in the caslib global_lib to a table named globalcars. Tables, like caslibs, also have a scope of session or global. By default, when you load a table into memory, the table has session scope. This means that the table is available to that session only.The PROMOTE option on CASUTIL makes the tables scope global so that it is available across sessions (there is an equivalent promote=yes Data step option).
Now that the data is in memory, let’s run a procedure against it. PROC CARDINALITY determines a variable’s cardinality or limited cardinality. The code calculates the cardinality of the variables MAKE and ORIGIN, and outputs the results to a table. You can see in the log that the processing happens in the CAS server.
You can also run DATA step code within CAS. Here we calculate two additional columns and create a new table. Note that the messages in the log indicate the processing is happening in the CAS server. For a DATA step to run in CAS both the input and output tables must use the same CAS engine libref and all language elements in the DATA step must be supported in CAS.
My data NEWCARS which I loaded from SASHELP and then modified, is in memory. Similair to a LASR in-memory table, if the server is stopped or I drop the table the data is gone. To save the table for future reuse we use PROC CASUTIL and save the table (like PROC IMSTAT save action). This code saves the table to the global_lib caslib (/opt/mydata) in a file named newcars.
As a different user or in a future SAS execution I can start a session and connect to the global caslib global_lib (assuming the user has permission).
Running PROC CASUTIL list will display the tables that are in-memory. Notice that the IRIS table and the CARS_CARD table are not listed. Only the original GLOBALCARS table is listed. This is because IRIS and CARS_CARD were not promoted, as a result there table scope is session, and they are only available in the session where they were loaded.
Since I saved NEWCARS (the version with the new calculated columns) to the CAS server I can use PROC CASUTIL to load it to memory. As the contents output shows, the data with the two additional columns is now available for processing. This code loads the table from disk to an in-memory table NEWGLOBALCARS.
That is a short journey through SAS VIYA Machine Learning. I should say that all of this applies to the 16W20 release.
In the code I:
There are few new concepts to get used to like sessions, CAS libraries and the CAS libname statement, however if you have used LASR with PROC IMSTAT, there are similarities in the way that you load and save data, and the running DATA step directly in the CAS server is pretty cool stuff.
The good news is there is lots of available documentation.