Hello Everyone,
If you are a SAS Viya platform user, specially working with Model Management and other APIs, it is very likely you've come across and even used the amazing Python SASCTL package. But, if you were an R developer, or had to push R models to SAS, you would have to switch between R and Python, or build your own API calls. This could be time consuming and cumbersome.
Struggle no more. I am excited to announce the R SASCTL package, where you will be able to interact easily with the SAS Viya platform APIs and of course, manage models straight from R.
Now, prepare your `<-` assignments and let's dive in how to use the package.
Since the package is not available on the main R repository, CRAN, you will install from our GitHub release. For additional installation methods and other details check the documentation page.
## Installing dependencies
install.packages(c("jsonlite", "httr", "uuid", "furrr", "ROCR", "reshape2"))
## installing the package
## for this first release we will be using X.X.X = 0.6.2
install.packages("https://github.com/sassoftware/r-sasctl/releases/download/X.X.X/r-sasctl_X.X.X.tar.gz", type = "source", repos = NULL)
## loading the library
library("sasctl")
As usual, when dealing with APIs, the first thing we are going to do is authenticate to the SAS Viya server. There are many methods available for this. Here, I will use the most basic one, password authentication. For other methods, such as using an "authinfo" file, authentication code or clients refer to the documentation. There is also a SAS Users blog post: Authentication to SAS Viya: a couple of approaches, which details the matter.
sess <- session(hostname = "https://myserver.sas.com",
username = "username",
password = "s3cr3t!")
The most important object we have to keep in mind is "sess" , since it is going to give your authentication information to most functions from now on.
A basic capability of the package is the convenient functions to call SAS Viya platform APIs, called vGET, vPUT, vPOST and vDELETE. These function will not only facilitate API authentication, but also parse the json files to tables, for easier access.
As a simple example, I will use the Folders API. First, we refer to the session, then we provide additional information.
folders <- vGET(sess, path = "folders/folders/")
print(folder(names))
[1] "version" "accept" "count" "start"
[5] "limit" "name" "items" "links"
As you can see, the response is a simple list with many objects inside it. Most of the information on this first level is about the API call. The actual results and information about the folders are inside the items object.
# showing the first 5 folders
head(folders$items[,c("id","name",'memberCount','description')])
id name memberCount description
1 00157c78-9b03-4fd3-be93-817809429e92 Code 5 <NA>
2 002c2d3f-e003-4b3d-9e00-34bff9f4a5ea formats 2 <NA>
3 002dae4a-e256-4ecb-b65d-f0b9b02ebd4e GitSettings 1 <NA>
4 00383427-5773-4ef8-9929-f10bc16ca9ed cdisc-cdash 4 <NA>
5 005923b2-07b9-412e-8459-c2b0cedd7832 Snippets 0 <NA>
6 00661c05-c785-4a65-bceb-f3a80649d57e My Snippets 0 My Snippet
To create a new folder, we make a simple post with the following call.
newFolder <- vPOST(sess,
path = paste0("folders/folders/"),
query = list(parentFolderUri = folders$items$parentFolderUri[1]),
payload = list(name = "newFolder"),
httr::content_type("application/json"))
## printing the folder and omitting links because they would use a lot of space.
newFolder[-11]
$creationTimeStamp
[1] "2023-02-07T18:20:08.67038Z"
$createdBy
[1] "username"
$modifiedTimeStamp
[1] "2023-02-07T18:20:08.67038Z"
$modifiedBy
[1] "username"
$version
[1] 1
$id
[1] "e359098b-6020-4068-931a-692b74f091c1"
$name
[1] "newFolder"
$parentFolderUri
[1] "/folders/folders/5893970f-701a-4529-b2c3-7968aa3ec46a"
$type
[1] "folder"
$memberCount
[1] 0
$etag
[1] "W/\"1675794008670380000\""
And finally, to delete the folder, we send a delete call.
deletedFolder <- vDELETE(sess,
path = paste0("folders/folders/", newFolder$id))
The resource folders/folders/e359098b-6020-4068-931a-692b74f091c1 was successfully deleted.
Using these methods you can interact with any SAS Viya platform API. Now, let's move to a more interesting use case, where we can interact with SAS Model Manager.
When working with SAS Model Manager from R, you can register, publish and manage models. But there are some restrictions on what runs directly in SAS and what may require a prior translation. Right now you can register pure R models, astores (SAS models saved from Viya using tools such as R SWAT or other GUI tools), PMML models or SPK (from SAS Enterprise Miner).
## Obtaining our data
hmeq <- read.csv("https://support.sas.com/documentation/onlinedoc/viya/exampledatasets/hmeq.csv")
## Cleaning our table
hmeq[hmeq == ""] <- NA
hmeq <- na.omit(hmeq) ### probably you do not want to do that, but for sake of simplicity
hmeq$BAD <- as.factor(hmeq$BAD)
hmeq$REASON <- as.factor(hmeq$REASON)
hmeq$JOB <- as.factor(hmeq$JOB)
### creating train/test/val
partition <- sample(c(1,2,3), replace = TRUE, prob = c(0.7, 0.2, 0.1), size = nrow(hmeq))
### logistic regression
model1 <- glm(formula = BAD ~ .,
family = binomial(link = 'logit'),
data = hmeq[partition == 1,]
)
## stepwise selection
model1 <- MASS::stepAIC(model1,
trace = 0)
### model summary
summary(model1)
Call:
glm(formula = BAD ~ JOB + DEROG + DELINQ + CLAGE + NINQ + CLNO +
DEBTINC, family = binomial(link = "logit"), data = hmeq[partition ==
1, ])
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8321 -0.4002 -0.2723 -0.1815 3.4436
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.112504 0.557474 -9.171 < 2e-16 ***
JOBOffice -0.347496 0.328065 -1.059 0.28950
JOBOther 0.218764 0.258648 0.846 0.39767
JOBProfExe 0.227723 0.293421 0.776 0.43769
JOBSales 1.289498 0.667405 1.932 0.05335 .
JOBSelf 0.740324 0.493300 1.501 0.13342
DEROG 0.756373 0.125728 6.016 1.79e-09 ***
DELINQ 0.797044 0.083996 9.489 < 2e-16 ***
CLAGE -0.007848 0.001336 -5.872 4.30e-09 ***
NINQ 0.127731 0.043009 2.970 0.00298 **
CLNO -0.019821 0.009297 -2.132 0.03301 *
DEBTINC 0.100709 0.012481 8.069 7.08e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1397.5 on 2330 degrees of freedom
Residual deviance: 1053.8 on 2319 degrees of freedom
AIC: 1077.8
Number of Fisher Scoring iterations: 6
# Creating a folder to save model information
dir.create("myModel")
path <- "myModel/"
## Saving the model
saveRDS(model1, paste0(path, 'rlogistic.rda'), version = 2)
## scoring the whole table
P_BAD1 <- predict(model1, newdata = hmeq, type = 'response')
P_BAD0 <- 1 - P_BAD1
# factors starts as 1 when using as.numeric,
# we have to add -1 to move to 0,1 scale correctly
# since diagnostics expects a numeric value
scoreddf <- data.frame(BAD = as.numeric(hmeq$BAD) - 1,
P_BAD1 = P_BAD1,
P_BAD0 = P_BAD0,
partition = partition)
diags <- diagnosticsJson(validadedf = scoreddf[scoreddf$partition == 3,],
traindf = scoreddf[scoreddf$partition == 1,],
testdf = scoreddf[scoreddf$partition == 2,],
targetEventValue = 1,
targetName = "BAD",
path = path)
[1] "File written to myModel/dmcas_lift.json"
[1] "File written to myModel/dmcas_roc.json"
[1] "File written to myModel/dmcas_fitstat.json"
create_scoreSample(path)
Example file copied to myModel/scoreCode.R
scoreFunction <- function(LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC)
{
#output: P_BAD0, P_BAD1, BAD
#rdsPath = './' ## uncomment this for testing in local
if (!exists("rlogistic.rda"))
{
assign("model1", readRDS(file = paste(rdsPath, 'rlogistic.rda', sep = '')), envir = .GlobalEnv)
}
data <- data.frame(LOAN = LOAN,
MORTDUE = MORTDUE,
VALUE = VALUE,
REASON = REASON,
JOB = JOB,
YOJ = YOJ,
DEROG = DEROG,
DELINQ = DELINQ,
CLAGE = CLAGE,
NINQ = NINQ,
CLNO = CLNO,
DEBTINC = DEBTINC)
### scorng new data
P_BAD1 <- predict.glm(model1, newdata = data, type = 'response')
P_BAD1
P_BAD0 <- 1 - P_BAD1 # this is P_BAD0
BAD <- ifelse(P_BAD1 >= 0.4, 1, 0)
### removing names to avoid additional info in the output list
names(P_BAD0) <- NULL
names(P_BAD1) <- NULL
names(BAD) <- NULL
# Include scoring logic here to get a list of the output variables.
output_list <- list('P_BAD0' = P_BAD0, 'P_BAD1' = P_BAD1, 'BAD' = as.character(BAD))
return(output_list)
}
## writing other files
write_in_out_json(hmeq[,-1], input = TRUE, path = path)
write_in_out_json(scoreddf[-4], input = FALSE, path = path)
write_fileMetadata_json(scoreCodeName = "scoreCode.R",
scoreResource = "rlogistic.rda",
path = path)
write_ModelProperties_json(modelName = "Rlogistic",
modelFunction = "Classification",
trainTable = "hmeq",
algorithm = "Logistic Regression",
numTargetCategories = 2,
targetEvent = "1",
targetVariable = "BAD",
eventProbVar = "P_BAD1",
modeler = "sasctl man",
path = path)
files_to_zip <- list.files(path, "*.json|*.R|*.rda", full.names = T)
zip(paste0(path, "Rmodel.zip"),
files = files_to_zip)
mod <- register_model(
session = sess,
file = "myModel/Rmodel.zip",
name = "RLogistic",
type = "zip",
project = "R_sasctl",
force = TRUE
)
The project with the name R_sasctl has been successfully created
While functionality and example for R SASCTL will continue to grow over time, the package already offers a wide variety of functions for creating and managing models. Please, feel free to reach me for any comments, questions, or feedback. Thanks a lot for the ride!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.