CAS Action Computed Columns

2 Likes

When I last looked at CAS programming, I mentioned that, while much more powerful than SAS, CAS still looked and felt like SAS. In that post, I was talking about DATA Step which is, of course, the heart of SAS programming. In this post, let's look at something completely new -- CAS Actions. In particular, let's focus on the COMPUTEDVARS and COMPUTEDVARSPROGRAM parameters which define the action's computed columns / computed variables.

So, even though the CAS Action paradigm is new, is it still grounded in the SAS concepts we all know and love? (hint, it is....)

CAS Action Overview

CAS Actions are the finest grain way to call CAS. They instruct CAS to perform one and only one action, for example:

table.loadTable -- Loads a table into CAS
regression.glmScore - Executes a generalized linear regression model
optimization.solveLP - Solves a linear program
clustering.kClus - Perform K-Means clustering

In many ways, CAS Actions are similar to SAS procedures (PROCs) in that both generally perform some well defined processing algorithm on input data and produce some kind of output, possibly table(s) and/or listing files (reports). However, SAS procedures also play a role in CAS as certain PROCs are CAS-enabled meaning that they trigger CAS actions to read and process CAS data. In general a CAS-enabled PROC will call one or more CAS actions to perform its CAS processing.

CAS Action Syntax

CAS Actions can be called from SAS (Proc CAS), Python, REST, as well as a few other languages. While everything we talk about in this post applies to all of these platforms, we'll use SAS here.

Calling CAS actions from SAS is done using PROC CAS while the language used inside PROC CAS is called CASL. Below is an example of the table.partition action.

proc cas;
 table.partition /
 
    table={name="mega_corp" 
           caslib="visual"
           computedVars={{name="Margin"} {name="RemainingLife"}}
           computedVarsProgram="Margin = Revenue - ExpensesMaterial;
                                RemainingLife = UnitLifespan - UnitLifespanLimit;"
          }
    casout={name="MGTest" 
            caslib="casuser"
            replace=true
            compress=false
           };
 run; 
 quit;

Parameter Syntax

Within the PROC CAS block, the desired CAS action is stated (table.partition) and then various optional and required parameters for the action are set. Parameters are generally:

Value assignments (e.g. replace=TRUE)
Lists of values(e.g. computedVars={...})
Lists of parameters (e.g. casOut={...})

Commas between parameters and list items are optional and lists are enclosed in braces ({}).

Computed Variables Syntax

As shown in the example above, we created two variables,

margin
remainingLife

These are listed within the COMPUTEDVARS parameter. The only required attribute for a computed column within this list parameter is the name. However you can define many more attributes for your new column including format, label, length, and precision.

Each variable created in COMPUTEDVARS must be defined in the COMPUTEDVARSPROGRAM parameter. This parameter must be set to a string containing DATA Step code. Generally, the computed variables will be defined using assignment statements. If-Then-Else blocks are also allowed. So conditional logic is possible.

Use Case 1: Materializing Calculations as Columns

As in the example above, we can use this functionality to create new columns from calculations on other columns (as well as constants, macro variables, etc.). Any CAS action that reads a CAS table and writes it to a new location (Partition, Index, Shuffle, etc.) can be used to do this.

In the example we used the partition action with no GROUP BY instructions so that it would simply write out a copy of the input CAS table with the new columns included. The new columns are calculated by the CAS action and materialized on the target CAS table.

Use Case 2: Performing Calculations on the Fly for a Specific Analysis

While you can use computed variables to create new materialized columns on a target CAS table, you can also use them to perform one-off calculations for a specific analysis. For example, you might create new columns for a more advanced summary analysis as shown below.

proc cas;
 simple.summary /
 
    table={name="mega_corp" 
           caslib="visual"
           computedVars={{name="Margin"} {name="RemainingLife"}}
           computedVarsProgram="Margin = Revenue - ExpensesMaterial;
                                RemainingLife = UnitLifespan - UnitLifespanLimit;"
           groupBy={"unit"}
          }
    subset={"mean"}
    inputs={"profit" "revenue" "Margin" "RemainingLife"};
 run; 
 quit;

In this example, the same two fields as before, margin and remainingLife, are created but they will not be stored anywhere. They simply exist for the life of the CAS action and are only used in enhancing the output summary report.

NB: Visual Analytics Calculated Items actually manifest using CAS computed variables as shown in this use case. So they are not stored with the data. Their logic is simply applied when the CAS action for the VA report is run.

NB2: Visual Analytics Aggregated Measures do not manifest as CAS computed variables. Aggregated measures are generally computed after the CAS actions have completed.

Use Case 3: Creating Virtual Fields Associated with a CAS Table

CAS computed columns can also be defined when creating a CAS view. While the partition action (or any other action that creates an output CAS table) will materialize any computedVars variables, the table.View action will create a virtual table that references both the materialized columns from the input CAS table and combines them with the calculated columns (computedVars) from the view definition only at query time. An example is below.

proc cas;
 table.view /
    name="virtualMega_corp"
    tables={{name="mega_corp" 
           caslib="visual"
           computedVars={{name="Margin"} {name="RemainingLife"}}
           computedVarsProgram="Margin = Revenue - ExpensesMaterial;
                                RemainingLife = UnitLifespan - UnitLifespanLimit;"
          }};
 run; 
 quit;

CAS Views have several advantages over fully materialized CAS tables in that they can save space since some calculated columns can be quite large. Also they offer advantages over temporary "on-the-fly" calculated columns in that the logic only needs to be created once.

Read my previous article to learn more about CAS views.