Following up my introductory article about publishing and running a model in-database, I wanted to illustrate this process from different perspectives. In this first scenario, we will put ourselves in the position of the person who designs models using SAS code.
We are in the process of building a model in SAS using code on a sample SAS data set, and ultimately, we want to run this model against a big Teradata table. Of course, we don’t want to move the Teradata table into SAS, run the score in SAS and push back the scored data into Teradata. That would be slow, unnecessarily duplicate the data, overuse the bandwidth, etc.
Instead, we want to move the scoring algorithm closer to the data, so that we can leverage the Teradata infrastructure without moving the data.
All right, building analytical models is not my specialty. I will probably oversimplify this phase. My apologies 😉!
Let's say I sampled my target table into a SAS data set that I then loaded in CAS. I’m now able to train a boosting model using the Gradient Boosting procedure:
/* Gradient Boosting modeling */ proc gradboost data=casdata.hmeq_train seed=12345 ; id record_pk ; input Delinq Derog Job nInq Reason / level = nominal ; input CLAge CLNo DebtInc Loan Mortdue Value YoJ / level = interval ; target Bad / level = nominal ; /* Save an analytic store */ savestate rstore=casdata.gradboost_store ; run ;
What I need to achieve my assigned goal is to create an analytic store for the model I trained. This is done using the savestate statement, which creates an ASTORE file and saves it as a binary object in a CAS table. I’m now ready to extract and push that model outside of SAS.
By publishing the model in Teradata, we mean that the SAS analytic store (the astore file) will be moved to Teradata in a table (which will act as a repository of models). Then we will be able to use that astore file in Teradata independently of SAS.
To publish a model in Teradata, we can use the SCOREACCEL procedure (and the publishmodel statement) or the modelPublishing.publishModelExternal CAS action (the SCOREACCEL publishmodel statement calls this CAS action behind the scenes):
/* Publish the model */ proc scoreaccel sessref=mysession ; publishmodel target=teradata caslib="tera" modelname="GradientBoosting_Code" modeltype=ds2 modeltable="sas_model_table" storetables="data.gradboost_store" modelnotes="Simple gradient boosting test model" keeplist=yes replacemodel=yes ; quit ;
In this example, we define:
80 publishmodel 81 target=teradata 82 caslib="tera" 83 modelname="GradientBoosting_Code" 84 modeltype=ds2 85 modeltable="sas_model_table" 86 storetables="data.gradboost_store" 87 modelnotes="Simple gradient boosting test model" 88 keeplist=yes 89 replacemodel=yes 90 ; NOTE: Executing action 'modelPublishing.publishModelExternal'. NOTE: Running 'modelPublishing' action set with 3 workers. NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb. NOTE: Executing SQL command to create the model table: CREATE SET TABLE "gelindb"."sas_model_table" (ModelName VARCHAR(128) CHARACTER SET UNICODE NOT CASESPECIFIC, ModelDS2 BLOB(2097088000), ModelFormats BLOB(2097088000), ModelUUID VARCHAR(36) CHARACTER SET UNICODE NOT CASESPECIFIC, Notes VARCHAR(512) CHARACTER SET UNICODE NOT CASESPECIFIC) UNIQUE PRIMARY INDEX (ModelName);. NOTE: Model 'GradientBoosting_Code' has been successfully published to the external database. NOTE: Action 'modelPublishing.publishModelExternal' used (Total process time): NOTE: real time 0.854720 seconds NOTE: cpu time 0.214032 seconds (25.04%) NOTE: total nodes 4 (8 cores) NOTE: total memory 62.55G NOTE: memory 13.57M (0.02%) 91 quit ; NOTE: PROCEDURE SCOREACCEL used (Total process time): real time 0.91 seconds cpu time 0.06 seconds
Since this is the first time we publish a model in Teradata, we can observe the creation of the model table in the log.
This model table in Teradata looks like this:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
It contains the astore file in a BLOB variable (ModelDS2).
We are ready for the final step of this process: scoring the Teradata data using the published model. We can do that in 2 ways, depending on how you want to orchestrate this scoring phase: would you like to run it as part of a SAS job? Or would you like to run it as part of a Teradata job?
To run a model in Teradata from SAS, we can use the SCOREACCEL procedure (and the runmodel statement) or the modelPublishing.runModelExternal CAS action (the SCOREACCEL runmodel statement calls this CAS action behind the scenes):
proc scoreaccel sessref=mysession ; runmodel target=teradata caslib="tera" modelname="GradientBoosting_Code" modeltable="sas_model_table" intable="hmeq_prod" outtable="hmeq_prod_code_scored" outkey="RECORD_PK" ; quit ;
In this example, we define:
87 runmodel 88 target=teradata 89 caslib="tera" 90 modelname="GradientBoosting_Code" 91 modeltable="sas_model_table" 92 intable="hmeq_prod" 93 outtable="hmeq_prod_code_scored" 94 outkey="RECORD_PK" 95 ; NOTE: Executing action 'modelPublishing.runModelExternal'. NOTE: Running 'modelPublishing' action set with 3 workers. NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb. NOTE: Executing SQL command to run the SAS Embedded Process: CALL SAS_SYSFNLIB.SAS_SCORE_EP('INQUERY=SELECT * FROM "gelindb"."hmeq_prod"', 'MODELTABLE="gelindb"."sas_model_table"', 'MODELNAME=GradientBoosting_Code', 'OUTTABLE="gelindb"."hmeq_prod_code_scored"', 'OUTKEY=RECORD_PK', 'OPTIONS='). NOTE: Execution of model 'GradientBoosting_Code' succeeded. NOTE: Action 'modelPublishing.runModelExternal' used (Total process time): NOTE: real time 4.016990 seconds NOTE: cpu time 0.036026 seconds (0.90%) NOTE: total nodes 4 (8 cores) NOTE: total memory 62.55G NOTE: memory 2.15M (0.00%) 96 quit ; NOTE: PROCEDURE SCOREACCEL used (Total process time): real time 4.06 seconds cpu time 0.04 seconds
We can observe that we are running the SAS Embedded Process. Under the covers, a Teradata Stored Procedure is called to score the data. The Teradata output table will contain the output variables from the model including the prediction information.
As you have probably noticed, we saw in the last step what Teradata SQL command SAS was sending to Teradata to score the data using the SAS Embedded Process. We can definitely use this command directly in Teradata as well:
CALL SAS_SYSFNLIB.SAS_SCORE_EP( 'INQUERY=SELECT * FROM "gelindb"."hmeq_prod"', 'MODELTABLE="gelindb"."sas_model_table"', 'MODELNAME=GradientBoosting_Code', 'OUTTABLE="gelindb"."hmeq_prod_code_scored_from_bteq"', 'OUTKEY=RECORD_PK', 'OPTIONS=');
In this article, we have illustrated how a SAS user can design a model and run it in Teradata without moving data from a code perspective.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.