We’re smarter together. Learn from this collection of community knowledge and add your expertise.

CAS and SAS Data Integration Studio 4.903 (on SAS 9.4M5)

by SAS Employee NicolasRobert ‎12-18-2017 12:39 PM - edited ‎12-21-2017 05:22 AM (1,387 Views)

SAS 9.4M5 has been released for 3 months and it now provides a great integration with CAS. SAS 9.4M5 is a true CAS client and it makes very easy to work with CAS, as you would normally do with the SAS engine that is part of a Viya deployment. You no longer need to use SAS/CONNECT. Among the new features and in addition to the CAS integration, M5 comes with a new version of SAS Data Integration Studio: 4.903. This version takes advantage of this better CAS integration from the SAS 9 world. Let's dive into how DI Studio now works with CAS.

 

I've already blogged about DI Studio integration on 9.4M4 with CAS. So, if you or your customer haven't moved yet to M5, then you might want to check out this article.

 

The 9.4M4 (SAS DI Studio 4.902) data transformation that works with CAS is called "Cloud Analytic Services Transfer" and allows you to move data between SAS 9.4 and CAS using SAS/CONNECT behind the scenes. This transformation is no longer available in DI Studio 4.903. But if you have designed DI jobs with this node and then have migrated to 9.4M5, these jobs will continue to work as expected, assuming you still have SAS/CONNECT on both sides (SAS9.4 and SAS Viya).

 

Loading data into CAS in DI Studio 4.903 is much more straightforward. There's a new built-in data transformation that is called "Cloud Analytic Services Table Loader". Before using it, let's talk about the metadata that you need to setup in order to be able to use it.

 

SAS9.4M5 Metadata Setup

 

To use CAS from SAS9.4M5 and SAS Data Integration Studio 4.903, you need:

 

  • A metadata definition for your CAS Server

You just have to add to your existing metadata an entry for referencing your CAS Server properties. This is done through the definition of a “Cloud Analytic Services Server” metadata object in SAS Management Console.


CAS1.png

  

 

Simply reference the CAS Controller host machine name and its port, and set Authentication type to “(None)”.

 

CAS2.png

 

 

 

  • A metadata definition for your CAS engine library

A CAS engine library is different from a CASLIB. A CASLIB is an object managed and owned by CAS. A CAS engine library is a SAS library that points to a CASLIB. It's analogous to having an Oracle schema on one side and a SAS library that points to this Oracle schema on the other side.

 

The SAS Management Console lets you define a CAS engine library in the metadata, that points to an existing CASLIB in CAS. Simply choose the “SAS Cloud Analytic Services Library

  

CAS3.png

 

 

Unlike the similar configuration of the 9.4M4 bridge (relying on SAS/CONNECT), you must not assign this library to the SAS Viya application server (if it exists) but rather to the “local” SASApp application server. Indeed, this “SASApp” application server is based on a 9.4M5 engine and thus is now able to interact directly with CAS.  

  

CAS4.png

 

 

Define the libref and the CASLIB that you want to connect to. The CASLIB must exist in CAS.  

 

CAS5.png

 

 

Finally, attach the CAS Server definition that you just created in the previous step. 

 

CAS6.png

   

 

 

  • SAS users who can connect interchangeably to the SAS (9.4) Workspace Server and CAS

From an authentication perspective, if you share the same security layer between your SAS9.4 environment and your SAS Viya environment, then it should work smoothly. SAS 9.4M5 credentials get automatically passed to CAS for authentication.

 

In my testing environment, I have 2 systems (OS for SAS9.4M5 and LDAP for Viya) but have the same login/password in both environments. That's definitely not a best practice but it works.

 

Use the “Cloud Analytic Services Table Loader” transformation

 

Once you are ready with the metadata, you are all set to use the new “Cloud Analytic Services Table Loader” data transformation to load different types of data into CAS. This node is located in the “Access” folder.

 

You cannot fully take advantage of this transformation until you connect it to a target CAS table defined in the metadata. This allows the transformation to know about the CAS server and to interact dynamically with it.

 

CAS7.png

 

 

Otherwise, the node will not be able to display files and tables that are currently available in CAS.

 

Thus, you need to import in the metadata already loaded CAS tables that will be used as targets or define yourself from scratch new CAS table metadata definitions.

 

All right. So, what could you do with this “CAS Table Loader”?

 

Load data from the SAS 9.4 environment to CAS (client-side loading)

 

This is probably the most natural way of using this transformation. You extract data from your SAS 9 data sources, transform it using various data manipulations and then want to make it available to your Viya users. This is done graphically by putting the right source table on the canvas and connect it to the CAS Table Loader. In most cases this should work smoothly without any change in the transformation (unless you want to change the default behavior).

 

CAS8.png

 

 

You can also configure the "CAS Table Loader" by selecting the right “Load Technique” (“LOAD DATA”) and the right source table in the transformation:

 

CAS9.png

 

 

As a result, the canvas will be refreshed and display the newly added source table.

 

Load files from the SAS 9.4 environment to CAS (client-side loading)

 

This is almost the same situation except that instead of SAS files, you want to load external files like CSV, Microsoft Excel, JMP files, etc. that are located on the SAS Workspace Server machine.

 

You can do this by selecting the right “Load Technique” (“LOAD FILE”) and the right data source file on your server.

 

CAS10.png

 

 

This cannot be configured using the canvas. Therefore, the input file will not appear on the process flow.

 

CAS11.png

 

 

NB: you could also prefer to use SAS DI Studio to read and parse your external file

  • if the format is not one of the standards supported by the “LOAD FILE” technique,
  • if you want to have more control on reading options,
  • if you want to transform the data file before loading
  • or if you want to have more traceability from a lineage perspective.

You will then move to a “LOAD DATA” technique using the canvas.

 

CAS12.png

 

 

 

Load data from the Viya/CAS environment to CAS (server-side loading)

 

Finally, this transformation enables you to fully control and orchestrate data loading in CAS, even server-side loading. This is new in this transformation, compared to what we have with the “CAS Transfer” data transformation in the previous DI Studio version. This is the “LOAD CASDATA” load technique.

 

Using this “Load Technique”, you will be able to:

 

  • List the currently available CASLIBs (with respect to the CAS user authorizations)

  

CAS13.png

 

  

  • Select a source CASLIB
    • It could be the same CASLIB as the one that supports your target table (=> natural loading)
    • It could be a different CASLIB than the one that supports your target table (=> cross loading)
  • List the source files associated with a CASLIB
    • The files list comes from an OS directory, or an HDFS directory, or a database schema, or a LASR library, etc.

 

CAS14.png

 

 

  • Select a source file

  

CAS15.png

 

 

  • And specify some additional options

As you can see, the transformation generates a PROC CASUTIL code that performs the server-side loading.

 

Options of interest

 

Replace

 

You can define the replacement strategy using the “Replace” option in the “Load Technique” tab. Notice that “Append” is only available for client-side loading of SAS data sets (LOAD DATA).

 

Be careful, if set to “Replace” and if the persistent copy is set to YES (see later), this option applies to both the in-memory table and the persistent copy of the table.

 

CAS16.png

 

 

View output columns

 

In this dialog box in the “Load Technique” tab, you can select the variables to be output in the target table. It will update the target table’s metadata definition.

 

CAS17.png

 

 

Where

 

You can specify filters to subset the data that you want to load.

 

CAS18.png

 

 

Group By

 

You can specify here the partition variables. CAS will organize the target table in memory over multiple nodes according to those variables. It will help further operations that require by-group processing.

 

It only works with the “LOAD DATA” technique.

 

CAS19.png

 

 

Order By

 

Ordering rows only makes sense within partitions. So, you can also “Order By” variables in this data transformation. It only works with the “LOAD DATA” technique and if you have defined a partition key.

 

Persist table

 

You can choose to persist the table in the target CAS library (“Options” tab). That means that you will load a table/file in CAS in-memory and this table will also be backed on disk in the CASLIB “datasource” folder (OS folder or HDFS folder).

 

This can be useful in client-side scenarios for fast re-loading or this can be painful. You might inadvertently generate many copies of your data. Use that with caution, default is YES! It works in conjunction with the REPLACE option, discussed earlier.

 

CAS20.png

 

 

This only works with path-based CASLIBs (PATH, HDFS, DNFS) in Viya 3.2 because they provide write-back capabilities. This will work with all CASLIBs (DBMS CASLIBs for example) in Viya 3.3.

 

Advanced Options

 

Finally, in the “Table Options” tab, you can:

 

  • Load data in CAS in a compressed format
  • Specify the number of copies for failover
  • Load data as a repeated table (client-side loading of SAS data sets only)
  • Promote the table directly after loading

 

CAS21.png

 

 

Certain CAS table options might not be supported for a given load technique and a given file format (especially SASHDAT).

 

Conclusion

 

SAS DI Studio 4.903 provides a nice way to load data from your 9.4 environment (but also from your CAS CASLIBs) into CAS with its new “CAS Table Loader” data transformation.

 

Keep in mind that you must restrict the usage of CAS in SAS DI Studio to the final loading step. Don’t try to process CAS data using the standard DI data transformations. It might work or not (with respect to the DATALIMIT option which prevents the CAS client to download an entire CAS table locally; default is 100MB). But it will use the SAS 9.4 Workspace Server and all the required data will be downloaded before.

 

If you want to process data in CAS from DI Studio, then a “User Written Code” or a new custom “Transformation” with the right syntax, are the only options.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.