In SAS Viya, we can publish and run a SAS scoring model in several target data platforms:
A question that often comes up is the ability to run SAS models (once they are published) directly from within the target data platform, without running a SAS program. Indeed, this makes sense when you want to embed a scoring phase as a part of a larger data engineering process without mixing technologies and handling complex integration points.
Recently, such capabilities have been added to Azure Synapse and Databricks. It is now possible to run SAS models inside Azure Synapse and Databricks without invoking SAS nor running a SAS program.
To do so, we will be using the Scala and Python API which was released in SAS Viya 2021.2.2. Keep in mind that to use this API:
Let’s highlight some of the important instructions by looking at a Scala example on Azure Synapse.
First, you have to import the package that contains the implementation of the Model class:
import com.sas.spark.scoring._
To score data, we need to load the input table in a Spark dataset:
var inDataset = spark.table("default.hmeq_spark")
Then, we need to create a model that has been previously published into ADLS from SAS Viya:
var mymodel = Model.create(inDataset,"abfss://blobdata@mystorageaccount.dfs.core.windows.net/models/01_gradboost_astore/01_gradboost_astore.is")
ABFSS is the driver to use in Azure Synapse to access a blob in ADLS. 01_gradboost_astore is the name of the SAS model published in ADLS from SAS.
Optionally, we can add some options to the model:
mymodel.setDBMaxText(2000)
mymodel.setTraceON
Check the documentation for additional information on the options available.
Then we are ready to run the SAS model. This produces an output Spark dataset:
var dfout = mymodel.run
Potentially, we may want to save the output dataset as a Spark table:
dfout.write.mode("overwrite").saveAsTable("default.hmeq_spark_astore_api")
Here we go! We have run a SAS scoring model directly in the Azure Synapse ecosystem and we can leverage immediately scoring insights contained in the output Spark table.
What about an example with Python and Databricks?
Here are the equivalent Python instructions used against Databricks in this case:
from sasep.model import Model
hmeqin = spark.table("default.hmeq_prod")
mymodel = Model.create(hmeqin, "dbfs:/mnt/adls/models/01_gradboost_astore/01_gradboost_astore.is")
mymodel.setDBMaxText(2000)
mymodel.setTraceON()
hmeqout = mymodel.run()
hmeqout.write().mode("overwrite").saveAsTable("default.hmeq_out_api")
Notice in this case that we have to mount the ADLS blob container (or S3 if we run on AWS) to a Databricks file system, hence the dbfs driver pointing to a mount point.
You can find complete examples in the documentation. You can use both APIs interchangeably with Azure Synapse and Databricks.
Many thanks to my colleagues Maggie Marcum, Josh Mcclung, David Ghazaleh and Alex Fang for their help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.