In my last post, I wrote about publishing and running a SAS scoring model in Azure Databricks. Let’s focus now on scoring in Azure Synapse Analytics. The overall process is quite similar.
Like in Azure Databricks, we need to install the SAS Embedded Process in Azure Synapse Analytics. The deployment steps are documented here.
As a reminder, the SAS Embedded Process is this lightweight SAS engine that will be deployed on a cluster (here a Spark pool) and that takes advantage of the cluster infrastructure. Basically, it will be able to run SAS code in parallel on the cluster’s distributed data.
To be able to score data in Azure Synapse, we need to publish the model in ADLS (Azure Data Lake Storage) that will be accessed by Azure Synapse behind the scenes.
Indeed, when you create an Azure Synapse workspace, you are asked to link an ADLS Gen2 filesystem (blob container) to the workspace. This ADLS container is where the SAS models will be published.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The overall publishing and running process is depicted below:
We need an ADLS caslib to publish a SAS model to ADLS and a Spark caslib to run it in Azure Synapse.
In addition, due to the nature of Azure Synapse which combines both data lake (Spark) and data warehouse (SQL Server) capabilities, we might need additional caslibs to manipulate/view data:
To publish a SAS model for consumption in Azure Synapse, I only need an ADLS caslib. It is exactly the same step as for Databricks (check out previous blog for more details). The ADLS storage account that we are publishing to must be the one that is linked to your Azure Synapse workspace. The code looks like the following:
caslib adls datasource= ( srctype="adls", accountname="**my-storage-account**", filesystem="**my-container**", applicationid="**my-application-id**", resource="https://storage.azure.com/", dnssuffix="dfs.core.windows.net" ) subdirs libref=adls ; proc scoreaccel sessref=mysession ; publishmodel target=filesystem caslib="adls" password="**my-application-secret**" modelname="01_gradboost_astore" storetables="spark.gradboost_store" modeldir="/models" replacemodel=yes ; quit ;
To run a SAS model in Azure Synapse, we need a Spark caslib. This Spark caslib just acts as a placeholder for the connection details to the Spark pool in Synapse.
Then we start a Spark continuous session of the SAS Embedded Process and within Synapse we can specify how much resources we want to allocate to the Spark session.
/* Used for running models in Synapse */ caslib spark datasource= ( srctype="spark", platform=synapse, username="**my-application-id**", password="**my-application-secret**", server="**synapse-workspace**.dev.azuresynapse.net", schema="sqlpool", hadoopJarPath="/azuredm/access-clients/spark/jars/sas", resturl="**livy-rest-url**", bulkload=no ) libref=spark ; /* Start the SAS Embedded Process */ proc cas ; sparkEmbeddedProcess.startSparkEP caslib="spark" trace=false executorInstances=4 executorCores=4 executorMemory=56 driverMemory=32 ; quit ;
We can run the model now:
/* Run the model */ proc scoreaccel sessref=mysession ; runmodel target=synapse caslib="spark" modelname="01_gradboost_astore" modeldir="/models" intable="hmeq_spark" schema="default" outtable="hmeq_spark_astore" outschema="default" forceoverwrite=yes ; quit ;
Scoring data in Synapse is very flexible in terms of input and output data objects. As depicted in the following figure, you can take several routes to score data in Synapse from SAS:
The following options drive the type of source/target data structure accessed:
You can even interact with the Spark session before or after the model execution, for pre- or post-processing:
/* Load a filtered Spark table into a Spark dataset */ proc cas ; sparkEmbeddedProcess.executeProgram caslib="spark" program="var dsin = spark.table(""default.hmeq_spark"").where($""REASON"" === ""DebtCon"");" ; quit ;
The program option accepts a user-written Scala syntax. Once you are done with the execution of all your models, you can stop the Spark continuous session:
proc cas ; sparkEmbeddedProcess.stopSparkEP caslib="spark" ; quit ;
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.