Streamlined SAS Scoring Model Deployment for Databricks, Azure Synapse, and More

In the past, I shared insights on how to publish and execute SAS scoring models in both Databricks and Azure Synapse Analytics:

Publish and Run a SAS Scoring Model In Azure Databricks

Publish and Run a SAS Scoring Model In Azure Synapse Analytics

Back then, deploying these models required using a cloud object storage location. The models would be accessed by the target platforms through a proprietary mechanism, such as a mount or link.

Previous Databricks (on Azure) Scoring Workflow

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

While this approach worked, it involved multiple connectivity points and mechanisms that might no longer be recommended by the vendor (for instance, Databricks has phased out the use of DBFS mounts). This added unnecessary complexity and occasionally led to errors.

The New and Improved Process

Starting in LTS 2024.03, publishing SAS scoring models has become much simpler for Databricks (both Azure and AWS), Azure Synapse Analytics, and Azure HDInsight. You can now publish models directly to a table within the target platform:

Databricks: Publish to a Spark table
Azure Synapse Analytics: Publish to a SQL Server table
Azure HDInsight: Publish to a Hive table

Let's revisit the Databricks example to see how the scoring process looks now:

Now, a single CASLIB manages both the publishing and execution of the scoring model, significantly reducing complexity.

Code Example: Publishing and Running a Model in Databricks

Here’s how you can publish a model to a Spark table and execute it in Databricks:

/* Create a Spark caslib */
caslib spark datasource=
   (
      srctype="spark",
      platform=databricks,
      bulkload=no,
      server="&SERVER",
      clusterid="&CLUSTERID",
      username="&USERNAME",
      password="&AUTHTOKEN",
      jobManagementURL="&JOBMANAGEMENTURL",
      httpPath="&HTTPPATH",
      properties="Catalog='&DB_CATALOG';UseLegacyDataModel=true;Other=ConnectRetryWaitTime=20;DefaultColumnSize=1024",
      schema="&DB_SCHEMA"
   ) libref=spark ;

/* Publish a model in a Spark table */
proc scoreaccel sessref=mysession ;
   publishmodel
      exttype=databricks
      caslib="spark"
      modelname="gradboost_astore"
      storetables="spark.gradboost_store"
      replacemodel=yes ;
quit ;

/* Start the SAS Embedded Process */
proc cas ;
   sparkEmbeddedProcess.startSparkEP caslib="spark" ;
quit ;

/* Run the model stored in a Spark table */
proc scoreaccel sessref=mysession ;
   runmodel 
      exttype=spark
      caslib="spark"
      modelname="gradboost_astore" modeldatabase="&DB_SCHEMA"
      intable="hmeq_prod" schema="&DB_SCHEMA"
      outtable="hmeq_prod_out_astore" outschema="&DB_SCHEMA"
      forceoverwrite=true ;
quit ;

Naming Convention Update

When publishing a model to a Spark table, the table will follow a consistent naming convention by prefixing the model's name with "sasmodel_.":

Conclusion

This update simplifies the SAS scoring model deployment process by consolidating the steps and reducing connectivity issues, making it easier to integrate SAS predictive analytics into your Databricks, Azure Synapse, or Azure HDInsight environment.

Thanks for reading!

Find more articles from SAS Global Enablement and Learning here.