BookmarkSubscribeRSS Feed

Publish and Run a Model In-Database From Code - A Teradata Example

Started ‎11-03-2021 by
Modified ‎11-03-2021 by
Views 3,676

Following up my introductory article about publishing and running a model in-database, I wanted to illustrate this process from different perspectives. In this first scenario, we will put ourselves in the position of the person who designs models using SAS code.

 

So, what’s the goal again?

We are in the process of building a model in SAS using code on a sample SAS data set, and ultimately, we want to run this model against a big Teradata table. Of course, we don’t want to move the Teradata table into SAS, run the score in SAS and push back the scored data into Teradata. That would be slow, unnecessarily duplicate the data, overuse the bandwidth, etc.

 

Instead, we want to move the scoring algorithm closer to the data, so that we can leverage the Teradata infrastructure without moving the data.

 

Design the model

All right, building analytical models is not my specialty. I will probably oversimplify this phase. My apologies 😉!

 

Let's say I sampled my target table into a SAS data set that I then loaded in CAS. I’m now able to train a boosting model using the Gradient Boosting procedure:

 

/* Gradient Boosting modeling */
proc gradboost data=casdata.hmeq_train seed=12345 ;
   id record_pk ;
   input Delinq Derog Job nInq Reason / level = nominal ;
   input CLAge CLNo DebtInc Loan Mortdue Value YoJ / level = interval ;
   target Bad / level = nominal ;
   /* Save an analytic store */
   savestate rstore=casdata.gradboost_store ;
run ;

 

What I need to achieve my assigned goal is to create an analytic store for the model I trained. This is done using the savestate statement, which creates an ASTORE file and saves it as a binary object in a CAS table. I’m now ready to extract and push that model outside of SAS.  

 

Publish the model in Teradata

By publishing the model in Teradata, we mean that the SAS analytic store (the astore file) will be moved to Teradata in a table (which will act as a repository of models). Then we will be able to use that astore file in Teradata independently of SAS.

 

To publish a model in Teradata, we can use the SCOREACCEL procedure (and the publishmodel statement) or the modelPublishing.publishModelExternal CAS action (the SCOREACCEL publishmodel statement calls this CAS action behind the scenes):

 

/* Publish the model */
proc scoreaccel sessref=mysession ;
   publishmodel
      target=teradata
      caslib="tera"
      modelname="GradientBoosting_Code"
      modeltype=ds2
      modeltable="sas_model_table"
      storetables="data.gradboost_store"
      modelnotes="Simple gradient boosting test model"
      keeplist=yes
      replacemodel=yes
      ;
quit ;

 

In this example, we define:

 

  • Teradata as our target
  • A Teradata CASLIB (must exist before)
  • The model name (unique identifier) in the repository
  • The model type (could be DATASTEP or DS2) – here we provide an astore file as input so it’s a DS2 model
  • The model table in Teradata that will act as a repository of models (the first time, it will be created in the referenced CASLIB)
  • The input model which here is the CAS table containing the astore file (storetables parameter)
  • Some additional options such as whether the model should be replaced if it already exists, limiting the output variables to the strict minimum, etc.

 

Partial log:

 

80      publishmodel
81         target=teradata
82         caslib="tera"
83         modelname="GradientBoosting_Code"
84         modeltype=ds2
85         modeltable="sas_model_table"
86         storetables="data.gradboost_store"
87         modelnotes="Simple gradient boosting test model"
88         keeplist=yes
89         replacemodel=yes
90         ;
NOTE: Executing action 'modelPublishing.publishModelExternal'.
NOTE: Running 'modelPublishing' action set with 3 workers.
NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb.
NOTE: Executing SQL command to create the model table: CREATE SET TABLE "gelindb"."sas_model_table" (ModelName VARCHAR(128) 
      CHARACTER SET UNICODE NOT CASESPECIFIC, ModelDS2 BLOB(2097088000), ModelFormats BLOB(2097088000), ModelUUID VARCHAR(36) 
      CHARACTER SET UNICODE NOT CASESPECIFIC, Notes VARCHAR(512) CHARACTER SET UNICODE NOT CASESPECIFIC) UNIQUE PRIMARY INDEX 
      (ModelName);.
NOTE: Model 'GradientBoosting_Code' has been successfully published to the external database.
NOTE: Action 'modelPublishing.publishModelExternal' used (Total process time):
NOTE:       real time               0.854720 seconds
NOTE:       cpu time                0.214032 seconds (25.04%)
NOTE:       total nodes             4 (8 cores)
NOTE:       total memory            62.55G
NOTE:       memory                  13.57M (0.02%)
91   quit ;
NOTE: PROCEDURE SCOREACCEL used (Total process time):
      real time           0.91 seconds
      cpu time            0.06 seconds

 

Since this is the first time we publish a model in Teradata, we can observe the creation of the model table in the log.

 

This model table in Teradata looks like this:

 

nir_post_69_01_model_table.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

It contains the astore file in a BLOB variable (ModelDS2).  

 

Run the model in Teradata

We are ready for the final step of this process: scoring the Teradata data using the published model. We can do that in 2 ways, depending on how you want to orchestrate this scoring phase: would you like to run it as part of a SAS job? Or would you like to run it as part of a Teradata job?  

 

From SAS

To run a model in Teradata from SAS, we can use the SCOREACCEL procedure (and the runmodel statement) or the modelPublishing.runModelExternal CAS action (the SCOREACCEL runmodel statement calls this CAS action behind the scenes):

 

proc scoreaccel sessref=mysession ;
   runmodel
      target=teradata
      caslib="tera"
      modelname="GradientBoosting_Code"
      modeltable="sas_model_table"
      intable="hmeq_prod"
      outtable="hmeq_prod_code_scored"
      outkey="RECORD_PK"
      ;
quit ;

 

In this example, we define:

 

  • Teradata as our target
  • A Teradata CASLIB (must exist before)
  • The model name (unique identifier) that we want to run
  • The model table in Teradata that contains the list of models (exists in the Teradata CASLIB above)
  • The input Teradata table that we want to score
  • The output Teradata table that we want to create
  • The variable that we want to use as the primary key in the output table

 

Partial log:

 

87      runmodel
88         target=teradata
89         caslib="tera"
90         modelname="GradientBoosting_Code"
91         modeltable="sas_model_table"
92         intable="hmeq_prod"
93         outtable="hmeq_prod_code_scored"
94         outkey="RECORD_PK"
95         ;
NOTE: Executing action 'modelPublishing.runModelExternal'.
NOTE: Running 'modelPublishing' action set with 3 workers.
NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb.
NOTE: Executing SQL command to run the SAS Embedded Process: CALL SAS_SYSFNLIB.SAS_SCORE_EP('INQUERY=SELECT * FROM 
      "gelindb"."hmeq_prod"', 'MODELTABLE="gelindb"."sas_model_table"', 'MODELNAME=GradientBoosting_Code', 
      'OUTTABLE="gelindb"."hmeq_prod_code_scored"', 'OUTKEY=RECORD_PK', 'OPTIONS=').
NOTE: Execution of model 'GradientBoosting_Code' succeeded.
NOTE: Action 'modelPublishing.runModelExternal' used (Total process time):
NOTE:       real time               4.016990 seconds
NOTE:       cpu time                0.036026 seconds (0.90%)
NOTE:       total nodes             4 (8 cores)
NOTE:       total memory            62.55G
NOTE:       memory                  2.15M (0.00%)
96   quit ;
NOTE: PROCEDURE SCOREACCEL used (Total process time):
      real time           4.06 seconds
      cpu time            0.04 seconds

 

We can observe that we are running the SAS Embedded Process. Under the covers, a Teradata Stored Procedure is called to score the data. The Teradata output table will contain the output variables from the model including the prediction information.  

 

From Teradata

As you have probably noticed, we saw in the last step what Teradata SQL command SAS was sending to Teradata to score the data using the SAS Embedded Process. We can definitely use this command directly in Teradata as well:

 

CALL SAS_SYSFNLIB.SAS_SCORE_EP(
   'INQUERY=SELECT * FROM "gelindb"."hmeq_prod"',
   'MODELTABLE="gelindb"."sas_model_table"',
   'MODELNAME=GradientBoosting_Code',
   'OUTTABLE="gelindb"."hmeq_prod_code_scored_from_bteq"',
   'OUTKEY=RECORD_PK',
   'OPTIONS=');

 

Conclusion

In this article, we have illustrated how a SAS user can design a model and run it in Teradata without moving data from a code perspective.

 

Thanks for reading. 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎11-03-2021 05:35 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags