Manage your model from R - R SASCTL is now available

2 Likes

Hello Everyone,

If you are a SAS Viya platform user, specially working with Model Management and other APIs, it is very likely you've come across and even used the amazing Python SASCTL package. But, if you were an R developer, or had to push R models to SAS, you would have to switch between R and Python, or build your own API calls. This could be time consuming and cumbersome.

Struggle no more. I am excited to announce the R SASCTL package, where you will be able to interact easily with the SAS Viya platform APIs and of course, manage models straight from R.

Now, prepare your `<-` assignments and let's dive in how to use the package.

Install the package

Since the package is not available on the main R repository, CRAN, you will install from our GitHub release. For additional installation methods and other details check the documentation page.

## Installing dependencies
install.packages(c("jsonlite", "httr", "uuid", "furrr", "ROCR", "reshape2"))

## installing the package
## for this first release we will be using X.X.X = 0.6.2
install.packages("https://github.com/sassoftware/r-sasctl/releases/download/X.X.X/r-sasctl_X.X.X.tar.gz", type = "source", repos = NULL)

## loading the library
library("sasctl")

Basic usage

As usual, when dealing with APIs, the first thing we are going to do is authenticate to the SAS Viya server. There are many methods available for this. Here, I will use the most basic one, password authentication. For other methods, such as using an "authinfo" file, authentication code or clients refer to the documentation. There is also a SAS Users blog post: Authentication to SAS Viya: a couple of approaches, which details the matter.

sess <- session(hostname = "https://myserver.sas.com",
                username = "username",
                password = "s3cr3t!")

The most important object we have to keep in mind is "sess" , since it is going to give your authentication information to most functions from now on.

A basic capability of the package is the convenient functions to call SAS Viya platform APIs, called vGET, vPUT, vPOST and vDELETE. These function will not only facilitate API authentication, but also parse the json files to tables, for easier access.

As a simple example, I will use the Folders API. First, we refer to the session, then we provide additional information.

folders <- vGET(sess, path = "folders/folders/")
print(folder(names))

[1] "version" "accept"  "count"   "start"  
[5] "limit"   "name"    "items"   "links"

As you can see, the response is a simple list with many objects inside it. Most of the information on this first level is about the API call. The actual results and information about the folders are inside the items object.

# showing the first 5 folders
head(folders$items[,c("id","name",'memberCount','description')])

                                    id        name memberCount description
1 00157c78-9b03-4fd3-be93-817809429e92        Code           5        <NA>
2 002c2d3f-e003-4b3d-9e00-34bff9f4a5ea     formats           2        <NA>
3 002dae4a-e256-4ecb-b65d-f0b9b02ebd4e GitSettings           1        <NA>
4 00383427-5773-4ef8-9929-f10bc16ca9ed cdisc-cdash           4        <NA>
5 005923b2-07b9-412e-8459-c2b0cedd7832    Snippets           0        <NA>
6 00661c05-c785-4a65-bceb-f3a80649d57e My Snippets           0 My Snippet

To create a new folder, we make a simple post with the following call.

newFolder <- vPOST(sess,
                   path = paste0("folders/folders/"),
                   query = list(parentFolderUri = folders$items$parentFolderUri[1]),
                   payload = list(name = "newFolder"),
                   httr::content_type("application/json"))

## printing the folder and omitting links because they would use a lot of space.
newFolder[-11]

$creationTimeStamp
[1] "2023-02-07T18:20:08.67038Z"

$createdBy
[1] "username"

$modifiedTimeStamp
[1] "2023-02-07T18:20:08.67038Z"

$modifiedBy
[1] "username"

$version
[1] 1

$id
[1] "e359098b-6020-4068-931a-692b74f091c1"

$name
[1] "newFolder"

$parentFolderUri
[1] "/folders/folders/5893970f-701a-4529-b2c3-7968aa3ec46a"

$type
[1] "folder"

$memberCount
[1] 0

$etag
[1] "W/\"1675794008670380000\""

And finally, to delete the folder, we send a delete call.

deletedFolder <- vDELETE(sess,
                         path = paste0("folders/folders/", newFolder$id))

The resource folders/folders/e359098b-6020-4068-931a-692b74f091c1 was successfully deleted.

Using these methods you can interact with any SAS Viya platform API. Now, let's move to a more interesting use case, where we can interact with SAS Model Manager.

SAS Model Manager and R

When working with SAS Model Manager from R, you can register, publish and manage models. But there are some restrictions on what runs directly in SAS and what may require a prior translation. Right now you can register pure R models, astores (SAS models saved from Viya using tools such as R SWAT or other GUI tools), PMML models or SPK (from SAS Enterprise Miner).

Even though you can register all of them, as of today, on SAS Viya 2023.1, you can publish and execute SAS formats (astores and SPK) in all available destination, but R models will only run on CAS or containers, and PMML (version 4.2) will be automatically translated to SAS code. You can reference the complete table here.

R SASCTL is not yet as advanced as it's Python counterpart to automatically create the scoring code in the format that SAS Viya expects, but read on to see how we can do it.

Train the R model

## Obtaining our data
hmeq <- read.csv("https://support.sas.com/documentation/onlinedoc/viya/exampledatasets/hmeq.csv")

## Cleaning our table
hmeq[hmeq == ""] <- NA
hmeq <- na.omit(hmeq) ### probably you do not want to do that, but for sake of simplicity
hmeq$BAD <- as.factor(hmeq$BAD)
hmeq$REASON <- as.factor(hmeq$REASON)
hmeq$JOB <- as.factor(hmeq$JOB)

### creating train/test/val
partition <- sample(c(1,2,3), replace = TRUE, prob = c(0.7, 0.2, 0.1), size = nrow(hmeq))

### logistic regression
model1 <- glm(formula = BAD ~ .,
              family = binomial(link = 'logit'),
              data = hmeq[partition == 1,]
              )

## stepwise selection
model1 <-  MASS::stepAIC(model1, 
                         trace = 0)

### model summary
summary(model1)

Call:
glm(formula = BAD ~ JOB + DEROG + DELINQ + CLAGE + NINQ + CLNO + 
    DEBTINC, family = binomial(link = "logit"), data = hmeq[partition == 
    1, ])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8321  -0.4002  -0.2723  -0.1815   3.4436  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.112504   0.557474  -9.171  < 2e-16 ***
JOBOffice   -0.347496   0.328065  -1.059  0.28950    
JOBOther     0.218764   0.258648   0.846  0.39767    
JOBProfExe   0.227723   0.293421   0.776  0.43769    
JOBSales     1.289498   0.667405   1.932  0.05335 .  
JOBSelf      0.740324   0.493300   1.501  0.13342    
DEROG        0.756373   0.125728   6.016 1.79e-09 ***
DELINQ       0.797044   0.083996   9.489  < 2e-16 ***
CLAGE       -0.007848   0.001336  -5.872 4.30e-09 ***
NINQ         0.127731   0.043009   2.970  0.00298 ** 
CLNO        -0.019821   0.009297  -2.132  0.03301 *  
DEBTINC      0.100709   0.012481   8.069 7.08e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1397.5  on 2330  degrees of freedom
Residual deviance: 1053.8  on 2319  degrees of freedom
AIC: 1077.8

Number of Fisher Scoring iterations: 6

Note that R automatically transforms the categorical variables in dummies with a reference level. Verify that it matches the input table. Using a dummy/one-hot-encoder methods to have more control may be useful. Or, you could add a step for correctly treating the data before scoring.

Create a folder to save all the required model files

# Creating a folder to save model information
dir.create("myModel")
path <- "myModel/"

Save the model and create additional files

Save the model as an standard .rda file

## Saving the model
saveRDS(model1, paste0(path, 'rlogistic.rda'), version = 2)

Score the model

Now, to create performance metrics the way the SAS Viya platform expects, we have to score our table.

## scoring the whole table
P_BAD1 <- predict(model1, newdata = hmeq, type = 'response')
P_BAD0 <- 1 - P_BAD1

# factors starts as 1 when using as.numeric, 
# we have to add -1 to move to 0,1 scale correctly
# since diagnostics expects a numeric value

scoreddf <- data.frame(BAD = as.numeric(hmeq$BAD) - 1, 
                       P_BAD1 = P_BAD1,
                       P_BAD0 = P_BAD0,
                       partition = partition)

and then we can create all the three basic diagnostic files ROC, LIFT and Fit statistics at once.

diags <- diagnosticsJson(validadedf = scoreddf[scoreddf$partition == 3,],
                         traindf = scoreddf[scoreddf$partition == 1,],
                         testdf = scoreddf[scoreddf$partition == 2,],
                         targetEventValue = 1,
                         targetName = "BAD",
                         path = path)

[1] "File written to myModel/dmcas_lift.json"
[1] "File written to myModel/dmcas_roc.json"
[1] "File written to myModel/dmcas_fitstat.json"

As mentioned earlier, R SASCTL doesn't yet have the feature to automatically create the score code; however, it can create a sample of the structure in the path, so you can edit to match your needs.

create_scoreSample(path)

Example file copied to myModel/scoreCode.R

Note: if you are using RStudio, it will automatically open the file.

Next, replace the created file code with the following code, which will allow SAS to run the model on CAS.

scoreFunction <- function(LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC)
{
  #output: P_BAD0, P_BAD1, BAD
  
  #rdsPath = './' ## uncomment this for testing in local
  if (!exists("rlogistic.rda"))
  {
    assign("model1", readRDS(file = paste(rdsPath, 'rlogistic.rda', sep = '')), envir = .GlobalEnv)
  }
  
  data <- data.frame(LOAN  =  LOAN, 
                     MORTDUE  =  MORTDUE, 
                     VALUE  =  VALUE, 
                     REASON  =  REASON, 
                     JOB  =  JOB, 
                     YOJ  =  YOJ, 
                     DEROG  =  DEROG, 
                     DELINQ  =  DELINQ, 
                     CLAGE  =  CLAGE, 
                     NINQ  =  NINQ, 
                     CLNO  =  CLNO, 
                     DEBTINC  =  DEBTINC)  
  
  ### scorng new data
  P_BAD1 <- predict.glm(model1, newdata = data, type = 'response')
  P_BAD1
  P_BAD0 <- 1 - P_BAD1 # this is P_BAD0
  BAD <- ifelse(P_BAD1 >= 0.4, 1, 0) 
  
  ### removing names to avoid additional info in the output list
  names(P_BAD0) <- NULL
  names(P_BAD1) <- NULL
  names(BAD) <- NULL
  
  # Include scoring logic here to get a list of the output variables.
  
  output_list <- list('P_BAD0' = P_BAD0, 'P_BAD1' = P_BAD1, 'BAD' = as.character(BAD))
  return(output_list)
}

Create model files

Further, we can create the last files to configure model manager when uploading the model.

## writing other files
write_in_out_json(hmeq[,-1], input = TRUE, path = path)

write_in_out_json(scoreddf[-4], input = FALSE, path = path)

write_fileMetadata_json(scoreCodeName = "scoreCode.R",
                        scoreResource = "rlogistic.rda",
                        path = path)

write_ModelProperties_json(modelName = "Rlogistic",
                           modelFunction = "Classification",
                           trainTable = "hmeq",
                           algorithm = "Logistic Regression",
                           numTargetCategories = 2,
                           targetEvent = "1",
                           targetVariable = "BAD",
                           eventProbVar = "P_BAD1",
                           modeler = "sasctl man",
                           path = path)

Register the model

Now, we zip our files and register the model.

files_to_zip <- list.files(path, "*.json|*.R|*.rda", full.names = T)
zip(paste0(path, "Rmodel.zip"), 
    files = files_to_zip)

mod <- register_model(
  session = sess,
  file = "myModel/Rmodel.zip",
  name = "RLogistic",
  type = "zip",
  project = "R_sasctl",
  force = TRUE
  )

The project with the name R_sasctl has been successfully created

Verify model in SAS Viya

We can go to SAS Model Manager and check for our model and its statistics.

Since this post is quite long at this point, I'll point you to the register_model function documentation for additional examples and scoring methods. The process is the same, just with different API calls. Be sure to refer back to the documentation and repository often, as we'll offer more examples as they're created.

Finally

While functionality and example for R SASCTL will continue to grow over time, the package already offers a wide variety of functions for creating and managing models. Please, feel free to reach me for any comments, questions, or feedback. Thanks a lot for the ride!