Running SAS Models in Azure Synapse and Databricks Without Invoking SAS

1 Like

In SAS Viya, we can publish and run a SAS scoring model in several target data platforms:

Hadoop Cloud Services
Cloudera Data Platform
Databricks
Azure Synapse Analytics
Teradata

A question that often comes up is the ability to run SAS models (once they are published) directly from within the target data platform, without running a SAS program. Indeed, this makes sense when you want to embed a scoring phase as a part of a larger data engineering process without mixing technologies and handling complex integration points.

Recently, such capabilities have been added to Azure Synapse and Databricks. It is now possible to run SAS models inside Azure Synapse and Databricks without invoking SAS nor running a SAS program.

To do so, we will be using the Scala and Python API which was released in SAS Viya 2021.2.2. Keep in mind that to use this API:

SAS In-Database Technologies for Databricks or Azure Synapse must be licensed (it is included in some SAS Viya offerings and can be added to others)
The SAS Embedded Process must be installed on the target platform

Let’s highlight some of the important instructions by looking at a Scala example on Azure Synapse.

First, you have to import the package that contains the implementation of the Model class:

import com.sas.spark.scoring._

To score data, we need to load the input table in a Spark dataset:

var inDataset = spark.table("default.hmeq_spark")

Then, we need to create a model that has been previously published into ADLS from SAS Viya:

var mymodel = Model.create(inDataset,"abfss://blobdata@mystorageaccount.dfs.core.windows.net/models/01_gradboost_astore/01_gradboost_astore.is")

ABFSS is the driver to use in Azure Synapse to access a blob in ADLS. 01_gradboost_astore is the name of the SAS model published in ADLS from SAS.

Optionally, we can add some options to the model:

mymodel.setDBMaxText(2000)
mymodel.setTraceON

Check the documentation for additional information on the options available.

Then we are ready to run the SAS model. This produces an output Spark dataset:

var dfout = mymodel.run

Potentially, we may want to save the output dataset as a Spark table:

dfout.write.mode("overwrite").saveAsTable("default.hmeq_spark_astore_api")

Here we go! We have run a SAS scoring model directly in the Azure Synapse ecosystem and we can leverage immediately scoring insights contained in the output Spark table.

What about an example with Python and Databricks?

Here are the equivalent Python instructions used against Databricks in this case:

from sasep.model import Model

hmeqin = spark.table("default.hmeq_prod")

mymodel = Model.create(hmeqin, "dbfs:/mnt/adls/models/01_gradboost_astore/01_gradboost_astore.is")

mymodel.setDBMaxText(2000)
mymodel.setTraceON()

hmeqout = mymodel.run()

hmeqout.write().mode("overwrite").saveAsTable("default.hmeq_out_api")

Notice in this case that we have to mount the ADLS blob container (or S3 if we run on AWS) to a Databricks file system, hence the dbfs driver pointing to a mount point.

You can find complete examples in the documentation. You can use both APIs interchangeably with Azure Synapse and Databricks.

Many thanks to my colleagues Maggie Marcum, Josh Mcclung, David Ghazaleh and Alex Fang for their help.

Running SAS Models in Azure Synapse and Databricks Without Invoking SAS

Ready to join fellow brilliant minds for the SAS Hackathon?

Free course: Data Literacy Essentials

Get Started