Following up my introductory article about publishing and running a model in-database, I wanted to illustrate this process from different perspectives. In this first scenario, we will put ourselves in the position of the person who designs models using SAS code.
We are in the process of building a model in SAS using code on a sample SAS data set, and ultimately, we want to run this model against a big Teradata table. Of course, we don’t want to move the Teradata table into SAS, run the score in SAS and push back the scored data into Teradata. That would be slow, unnecessarily duplicate the data, overuse the bandwidth, etc.
Instead, we want to move the scoring algorithm closer to the data, so that we can leverage the Teradata infrastructure without moving the data.
All right, building analytical models is not my specialty. I will probably oversimplify this phase. My apologies 😉!
Let's say I sampled my target table into a SAS data set that I then loaded in CAS. I’m now able to train a boosting model using the Gradient Boosting procedure:
/* Gradient Boosting modeling */
proc gradboost data=casdata.hmeq_train seed=12345 ;
id record_pk ;
input Delinq Derog Job nInq Reason / level = nominal ;
input CLAge CLNo DebtInc Loan Mortdue Value YoJ / level = interval ;
target Bad / level = nominal ;
/* Save an analytic store */
savestate rstore=casdata.gradboost_store ;
run ;
What I need to achieve my assigned goal is to create an analytic store for the model I trained. This is done using the savestate statement, which creates an ASTORE file and saves it as a binary object in a CAS table. I’m now ready to extract and push that model outside of SAS.
By publishing the model in Teradata, we mean that the SAS analytic store (the astore file) will be moved to Teradata in a table (which will act as a repository of models). Then we will be able to use that astore file in Teradata independently of SAS.
To publish a model in Teradata, we can use the SCOREACCEL procedure (and the publishmodel statement) or the modelPublishing.publishModelExternal CAS action (the SCOREACCEL publishmodel statement calls this CAS action behind the scenes):
/* Publish the model */
proc scoreaccel sessref=mysession ;
publishmodel
target=teradata
caslib="tera"
modelname="GradientBoosting_Code"
modeltype=ds2
modeltable="sas_model_table"
storetables="data.gradboost_store"
modelnotes="Simple gradient boosting test model"
keeplist=yes
replacemodel=yes
;
quit ;
In this example, we define:
Partial log:
80 publishmodel
81 target=teradata
82 caslib="tera"
83 modelname="GradientBoosting_Code"
84 modeltype=ds2
85 modeltable="sas_model_table"
86 storetables="data.gradboost_store"
87 modelnotes="Simple gradient boosting test model"
88 keeplist=yes
89 replacemodel=yes
90 ;
NOTE: Executing action 'modelPublishing.publishModelExternal'.
NOTE: Running 'modelPublishing' action set with 3 workers.
NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb.
NOTE: Executing SQL command to create the model table: CREATE SET TABLE "gelindb"."sas_model_table" (ModelName VARCHAR(128)
CHARACTER SET UNICODE NOT CASESPECIFIC, ModelDS2 BLOB(2097088000), ModelFormats BLOB(2097088000), ModelUUID VARCHAR(36)
CHARACTER SET UNICODE NOT CASESPECIFIC, Notes VARCHAR(512) CHARACTER SET UNICODE NOT CASESPECIFIC) UNIQUE PRIMARY INDEX
(ModelName);.
NOTE: Model 'GradientBoosting_Code' has been successfully published to the external database.
NOTE: Action 'modelPublishing.publishModelExternal' used (Total process time):
NOTE: real time 0.854720 seconds
NOTE: cpu time 0.214032 seconds (25.04%)
NOTE: total nodes 4 (8 cores)
NOTE: total memory 62.55G
NOTE: memory 13.57M (0.02%)
91 quit ;
NOTE: PROCEDURE SCOREACCEL used (Total process time):
real time 0.91 seconds
cpu time 0.06 seconds
Since this is the first time we publish a model in Teradata, we can observe the creation of the model table in the log.
This model table in Teradata looks like this:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
It contains the astore file in a BLOB variable (ModelDS2).
We are ready for the final step of this process: scoring the Teradata data using the published model. We can do that in 2 ways, depending on how you want to orchestrate this scoring phase: would you like to run it as part of a SAS job? Or would you like to run it as part of a Teradata job?
To run a model in Teradata from SAS, we can use the SCOREACCEL procedure (and the runmodel statement) or the modelPublishing.runModelExternal CAS action (the SCOREACCEL runmodel statement calls this CAS action behind the scenes):
proc scoreaccel sessref=mysession ;
runmodel
target=teradata
caslib="tera"
modelname="GradientBoosting_Code"
modeltable="sas_model_table"
intable="hmeq_prod"
outtable="hmeq_prod_code_scored"
outkey="RECORD_PK"
;
quit ;
In this example, we define:
Partial log:
87 runmodel
88 target=teradata
89 caslib="tera"
90 modelname="GradientBoosting_Code"
91 modeltable="sas_model_table"
92 intable="hmeq_prod"
93 outtable="hmeq_prod_code_scored"
94 outkey="RECORD_PK"
95 ;
NOTE: Executing action 'modelPublishing.runModelExternal'.
NOTE: Running 'modelPublishing' action set with 3 workers.
NOTE: Connected to: host=10.1.1.4 user=gel database=gelindb.
NOTE: Executing SQL command to run the SAS Embedded Process: CALL SAS_SYSFNLIB.SAS_SCORE_EP('INQUERY=SELECT * FROM
"gelindb"."hmeq_prod"', 'MODELTABLE="gelindb"."sas_model_table"', 'MODELNAME=GradientBoosting_Code',
'OUTTABLE="gelindb"."hmeq_prod_code_scored"', 'OUTKEY=RECORD_PK', 'OPTIONS=').
NOTE: Execution of model 'GradientBoosting_Code' succeeded.
NOTE: Action 'modelPublishing.runModelExternal' used (Total process time):
NOTE: real time 4.016990 seconds
NOTE: cpu time 0.036026 seconds (0.90%)
NOTE: total nodes 4 (8 cores)
NOTE: total memory 62.55G
NOTE: memory 2.15M (0.00%)
96 quit ;
NOTE: PROCEDURE SCOREACCEL used (Total process time):
real time 4.06 seconds
cpu time 0.04 seconds
We can observe that we are running the SAS Embedded Process. Under the covers, a Teradata Stored Procedure is called to score the data. The Teradata output table will contain the output variables from the model including the prediction information.
As you have probably noticed, we saw in the last step what Teradata SQL command SAS was sending to Teradata to score the data using the SAS Embedded Process. We can definitely use this command directly in Teradata as well:
CALL SAS_SYSFNLIB.SAS_SCORE_EP(
'INQUERY=SELECT * FROM "gelindb"."hmeq_prod"',
'MODELTABLE="gelindb"."sas_model_table"',
'MODELNAME=GradientBoosting_Code',
'OUTTABLE="gelindb"."hmeq_prod_code_scored_from_bteq"',
'OUTKEY=RECORD_PK',
'OPTIONS=');
In this article, we have illustrated how a SAS user can design a model and run it in Teradata without moving data from a code perspective.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.