SAS 9.4M5 has been released for 3 months and it now provides a great integration with CAS. SAS 9.4M5 is a true CAS client and it makes very easy to work with CAS, as you would normally do with the SAS engine that is part of a Viya deployment. You no longer need to use SAS/CONNECT. Among the new features and in addition to the CAS integration, M5 comes with a new version of SAS Data Integration Studio: 4.903. This version takes advantage of this better CAS integration from the SAS 9 world. Let's dive into how DI Studio now works with CAS.
I've already blogged about DI Studio integration on 9.4M4 with CAS. So, if you or your customer haven't moved yet to M5, then you might want to check out this article.
The 9.4M4 (SAS DI Studio 4.902) data transformation that works with CAS is called "Cloud Analytic Services Transfer" and allows you to move data between SAS 9.4 and CAS using SAS/CONNECT behind the scenes. This transformation is no longer available in DI Studio 4.903. But if you have designed DI jobs with this node and then have migrated to 9.4M5, these jobs will continue to work as expected, assuming you still have SAS/CONNECT on both sides (SAS9.4 and SAS Viya).
Loading data into CAS in DI Studio 4.903 is much more straightforward. There's a new built-in data transformation that is called "Cloud Analytic Services Table Loader". Before using it, let's talk about the metadata that you need to setup in order to be able to use it.
SAS9.4M5 Metadata Setup
To use CAS from SAS9.4M5 and SAS Data Integration Studio 4.903, you need:
You just have to add to your existing metadata an entry for referencing your CAS Server properties. This is done through the definition of a “Cloud Analytic Services Server” metadata object in SAS Management Console.
Simply reference the CAS Controller host machine name and its port, and set Authentication type to “(None)”.
A CAS engine library is different from a CASLIB. A CASLIB is an object managed and owned by CAS. A CAS engine library is a SAS library that points to a CASLIB. It's analogous to having an Oracle schema on one side and a SAS library that points to this Oracle schema on the other side.
The SAS Management Console lets you define a CAS engine library in the metadata, that points to an existing CASLIB in CAS. Simply choose the “SAS Cloud Analytic Services Library”
Unlike the similar configuration of the 9.4M4 bridge (relying on SAS/CONNECT), you must not assign this library to the SAS Viya application server (if it exists) but rather to the “local” SASApp application server. Indeed, this “SASApp” application server is based on a 9.4M5 engine and thus is now able to interact directly with CAS.
Define the libref and the CASLIB that you want to connect to. The CASLIB must exist in CAS.
Finally, attach the CAS Server definition that you just created in the previous step.
From an authentication perspective, if you share the same security layer between your SAS9.4 environment and your SAS Viya environment, then it should work smoothly. SAS 9.4M5 credentials get automatically passed to CAS for authentication.
In my testing environment, I have 2 systems (OS for SAS9.4M5 and LDAP for Viya) but have the same login/password in both environments. That's definitely not a best practice but it works.
Use the “Cloud Analytic Services Table Loader” transformation
Once you are ready with the metadata, you are all set to use the new “Cloud Analytic Services Table Loader” data transformation to load different types of data into CAS. This node is located in the “Access” folder.
You cannot fully take advantage of this transformation until you connect it to a target CAS table defined in the metadata. This allows the transformation to know about the CAS server and to interact dynamically with it.
Otherwise, the node will not be able to display files and tables that are currently available in CAS.
Thus, you need to import in the metadata already loaded CAS tables that will be used as targets or define yourself from scratch new CAS table metadata definitions.
All right. So, what could you do with this “CAS Table Loader”?
Load data from the SAS 9.4 environment to CAS (client-side loading)
This is probably the most natural way of using this transformation. You extract data from your SAS 9 data sources, transform it using various data manipulations and then want to make it available to your Viya users. This is done graphically by putting the right source table on the canvas and connect it to the CAS Table Loader. In most cases this should work smoothly without any change in the transformation (unless you want to change the default behavior).
You can also configure the "CAS Table Loader" by selecting the right “Load Technique” (“LOAD DATA”) and the right source table in the transformation:
As a result, the canvas will be refreshed and display the newly added source table.
Load files from the SAS 9.4 environment to CAS (client-side loading)
This is almost the same situation except that instead of SAS files, you want to load external files like CSV, Microsoft Excel, JMP files, etc. that are located on the SAS Workspace Server machine.
You can do this by selecting the right “Load Technique” (“LOAD FILE”) and the right data source file on your server.
This cannot be configured using the canvas. Therefore, the input file will not appear on the process flow.
NB: you could also prefer to use SAS DI Studio to read and parse your external file
You will then move to a “LOAD DATA” technique using the canvas.
Load data from the Viya/CAS environment to CAS (server-side loading)
Finally, this transformation enables you to fully control and orchestrate data loading in CAS, even server-side loading. This is new in this transformation, compared to what we have with the “CAS Transfer” data transformation in the previous DI Studio version. This is the “LOAD CASDATA” load technique.
Using this “Load Technique”, you will be able to:
As you can see, the transformation generates a PROC CASUTIL code that performs the server-side loading.
Options of interest
You can define the replacement strategy using the “Replace” option in the “Load Technique” tab. Notice that “Append” is only available for client-side loading of SAS data sets (LOAD DATA).
Be careful, if set to “Replace” and if the persistent copy is set to YES (see later), this option applies to both the in-memory table and the persistent copy of the table.
View output columns
In this dialog box in the “Load Technique” tab, you can select the variables to be output in the target table. It will update the target table’s metadata definition.
You can specify filters to subset the data that you want to load.
You can specify here the partition variables. CAS will organize the target table in memory over multiple nodes according to those variables. It will help further operations that require by-group processing.
It only works with the “LOAD DATA” technique.
Ordering rows only makes sense within partitions. So, you can also “Order By” variables in this data transformation. It only works with the “LOAD DATA” technique and if you have defined a partition key.
You can choose to persist the table in the target CAS library (“Options” tab). That means that you will load a table/file in CAS in-memory and this table will also be backed on disk in the CASLIB “datasource” folder (OS folder or HDFS folder).
This can be useful in client-side scenarios for fast re-loading or this can be painful. You might inadvertently generate many copies of your data. Use that with caution, default is YES! It works in conjunction with the REPLACE option, discussed earlier.
This only works with path-based CASLIBs (PATH, HDFS, DNFS) in Viya 3.2 because they provide write-back capabilities. This will work with all CASLIBs (DBMS CASLIBs for example) in Viya 3.3.
Finally, in the “Table Options” tab, you can:
Certain CAS table options might not be supported for a given load technique and a given file format (especially SASHDAT).
SAS DI Studio 4.903 provides a nice way to load data from your 9.4 environment (but also from your CAS CASLIBs) into CAS with its new “CAS Table Loader” data transformation.
Keep in mind that you must restrict the usage of CAS in SAS DI Studio to the final loading step. Don’t try to process CAS data using the standard DI data transformations. It might work or not (with respect to the DATALIMIT option which prevents the CAS client to download an entire CAS table locally; default is 100MB). But it will use the SAS 9.4 Workspace Server and all the required data will be downloaded before.
If you want to process data in CAS from DI Studio, then a “User Written Code” or a new custom “Transformation” with the right syntax, are the only options.