BookmarkSubscribeRSS Feed

Running SAS Models in Azure Synapse and Databricks Without Invoking SAS

Started ‎10-13-2022 by
Modified ‎10-13-2022 by
Views 668

In SAS Viya, we can publish and run a SAS scoring model in several target data platforms:

 

  • Hadoop Cloud Services
  • Cloudera Data Platform
  • Databricks
  • Azure Synapse Analytics
  • Teradata

 

A question that often comes up is the ability to run SAS models (once they are published) directly from within the target data platform, without running a SAS program. Indeed, this makes sense when you want to embed a scoring phase as a part of a larger data engineering process without mixing technologies and handling complex integration points.

 

Recently, such capabilities have been added to Azure Synapse and Databricks. It is now possible to run SAS models inside Azure Synapse and Databricks without invoking SAS nor running a SAS program.

 

To do so, we will be using the Scala and Python API which was released in SAS Viya 2021.2.2. Keep in mind that to use this API:

 

  • SAS In-Database Technologies for Databricks or Azure Synapse must be licensed (it is included in some SAS Viya offerings and can be added to others)
  • The SAS Embedded Process must be installed on the target platform

 

Let’s highlight some of the important instructions by looking at a Scala example on Azure Synapse.

 

First, you have to import the package that contains the implementation of the Model class:

 

import com.sas.spark.scoring._

 

To score data, we need to load the input table in a Spark dataset:

 

var inDataset = spark.table("default.hmeq_spark")

 

Then, we need to create a model that has been previously published into ADLS from SAS Viya:

 

var mymodel = Model.create(inDataset,"abfss://blobdata@mystorageaccount.dfs.core.windows.net/models/01_gradboost_astore/01_gradboost_astore.is")

 

ABFSS is the driver to use in Azure Synapse to access a blob in ADLS. 01_gradboost_astore is the name of the SAS model published in ADLS from SAS.

 

Optionally, we can add some options to the model:

 

mymodel.setDBMaxText(2000)
mymodel.setTraceON

 

Check the documentation for additional information on the options available.

 

Then we are ready to run the SAS model. This produces an output Spark dataset:

 

var dfout = mymodel.run

 

Potentially, we may want to save the output dataset as a Spark table:

 

dfout.write.mode("overwrite").saveAsTable("default.hmeq_spark_astore_api")

 

Here we go! We have run a SAS scoring model directly in the Azure Synapse ecosystem and we can leverage immediately scoring insights contained in the output Spark table.  

 

What about an example with Python and Databricks?

 

Here are the equivalent Python instructions used against Databricks in this case:

 

from sasep.model import Model

hmeqin = spark.table("default.hmeq_prod")

mymodel = Model.create(hmeqin, "dbfs:/mnt/adls/models/01_gradboost_astore/01_gradboost_astore.is")

mymodel.setDBMaxText(2000)
mymodel.setTraceON()

hmeqout = mymodel.run()

hmeqout.write().mode("overwrite").saveAsTable("default.hmeq_out_api")

 

Notice in this case that we have to mount the ADLS blob container (or S3 if we run on AWS) to a Databricks file system, hence the dbfs driver pointing to a mount point.

 

You can find complete examples in the documentation. You can use both APIs interchangeably with Azure Synapse and Databricks.  

 

Many thanks to my colleagues Maggie Marcum, Josh Mcclung, David Ghazaleh and Alex Fang for their help.

Version history
Last update:
‎10-13-2022 11:14 AM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started