BookmarkSubscribeRSS Feed

Publish and Run a SAS Scoring Model in SingleStore

Started ‎12-18-2023 by
Modified ‎12-18-2023 by
Views 271

SAS Viya with SingleStore is a solution that embeds a SingleStore database in SAS’ analytics platform. The combined power of SAS Viya and SingleStore enables customers to take advantage of SAS analytics on data stored in a modern high-performance database, avoiding data duplication, minimizing data movement and reducing hardware needs. SAS Viya with SingleStore has been recently complemented with scoring acceleration, making the entire analytics lifecycle inside SingleStore a reality.

 

Indeed, you can:

  • Prepare data in SingleStore (ETL/ELT capabilities in SAS Studio, pass-through, etc.)
  • Build predictive models on SingleStore data (data streaming and caching, multi-pass analytics, etc.)
  • Publish and run models directly in SingleStore (scoring acceleration)

In this article, I will focus SingleStore scoring acceleration which has been released in May 2023 and thus is available in LTS 2023.10.

 

Lifecycle

 

This over-simplified lifecycle below illustrates what usually needs to be done to reach the goal which is scoring database data in place without moving the data.

 

nir_post_92_01_lifecycle.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

 

Designing the model is probably the most important and time-consuming phase (and I’m not even talking about the data preparation beforehand that takes a lot of time).

 

The two remaining steps (publish and run) are simple technical steps. With SingleStore embedded in SAS Viya’s platform, the process is smooth and requires no additional configuration.

 

Design the model

 

Here, I will just show a code example of a gradient boosting model as well as the caslib definition. Gradient boosting requires multiple passes of the data. I set the multipassMemory option to “cache” in order to cache temporarily the data set in CAS for better performance.

 

 

/* SingleStore caslib */
caslib s2 dataSource=(srctype='singlestore',
   database="&MYS2DB",
   epdatabase="&MYS2DBWORK",
   host="&MYS2HOST",
   pass="&MYS2PWD",
   port=&MYS2PORT,
   user="&MYS2UID",
   multipassMemory="cache"
) libref=cass2 ;
 
/* Load the SingleStore training table in CAS - no duplication */
proc casutil incaslib="s2" outcaslib="s2" ;
   dropTable casdata="hmeq_train" quiet ;
   load casdata="hmeq_train" casout="hmeq_train" ;
quit ;
 
/* Gradient Boosting modeling */
proc gradboost data=cass2.hmeq_train seed=12345 ;
   id record_pk ;
   input Delinq Derog Job nInq Reason / level = nominal ;
   input CLAge CLNo DebtInc Loan Mortdue Value YoJ / level = interval ;
   target Bad / level = nominal ;
   /* Save an analytic store */
   savestate rstore=cass2.gradboost_store ;
run ;

 

What is interesting with SAS Viya with SingleStore is that this model is designed directly on SingleStore data (which is filtered in SingleStore, streamed in real-time from SingleStore and cached in CAS for the duration of the process). The input table does not need to be permanently duplicated and copied over to CAS before. See SAS Viya with SingleStore: Data Flow Concept.

 

Notice the output of the modeling phase: an ASTORE (analytic store) stored in a CAS table that we will use later. Other model types are supported too.

 

Publish the model

 

If you have SAS Viya with SingleStore, everything is already configured. Publishing a model just consists in running the following piece of code:

 

/* Publish the model */
proc scoreaccel sessref=mysession ;
   publishmodel
      target=singlestore
      caslib="s2"
      modelname="GradientBoosting_Code"
      modeltype=ds2
      storetables="s2.gradboost_store"
      modelnotes="Simple gradient boosting test model"
      keeplist=yes
      replacemodel=yes
      ;
quit ;

Here, we want to publish the model currently stored in the gradboost_store CAS table, created during the design phase.

 

As a result, a SingleStore table, named sasmodel.GradientBoosting_Code, is created and contains the SAS model:

 

nir_post_92_02_sasmodel_table.png

 

Run the model

 

This is the final step. Now that the SAS model is in SingleStore, we will be able to score SingleStore data without having to move the data elsewhere. This is done with the following code:

 

/* Run the model in SingleStore */
/* No Data Movement */
proc scoreaccel sessref=mysession ;
   runmodel
      target=singlestore
      caslib="s2"
      modelname="GradientBoosting_Code"
      intable="hmeq_prod"
      outtable="hmeq_prod_code_scored"
      outkey="RECORD_PK"
      verbose
      ;
quit ;

 

We specify the model we want to use, the input table to read, the output table to create and the key we want to use (the output table will only contain the key plus the model output variables). This uses the SAS Embedded Process (deployed automatically in SingleStore) behind the scenes.

 

The scoring output table will look like this:

 

nir_post_92_03_model_output.png

 

Iterate again

 

Of course, this lifecycle is based on iterations. A model can perform well one day but might be less relevant one month later. And you might want to publish and run new models.

 

You can also manage your models in SingleStore and remove old ones:

 

/* Delete a model from SingleStore database */
proc scoreaccel sessref=mysession ;
   deletemodel
      target=singlestore
      caslib="s2"
      modelname="GradientBoosting_Code"
      ;
quit ;

 Scoring acceleration is an important step in reducing data movement between SingleStore and SAS Viya. This is a very relevant capability in SAS Viya with SingleStore to keep maximizing benefits from a high-performance database with enterprise-class analytics.

 

Thanks for reading.

Version history
Last update:
‎12-18-2023 04:58 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started