Updating a database table with SAS Viya

2 Likes

SAS Cloud Analytic Services offers a variety of capabilities regarding database data: loading data from databases using serial, multi-node and parallel methods, saving CAS data back to databases, offloading some SQL queries from CAS using FedSQL, etc.

CAS write-back to database is a great feature but we can extend it with traditional SAS/ACCESS capabilities to make it even better. Indeed, updating an existing database table directly from CAS is not available yet.

However, we have options to circumvent this since SAS Viya not only leverages CAS data connectors capabilities but also traditional SAS/ACCESS features through the use of SAS compute services. So, we can easily combine CAS saving mechanisms with SAS/ACCESS PROC SQL implicit or explicit pass-through. And SAS Job Flow Scheduler makes it easier to orchestrate them.

Data Lifecycle… an example

The following flow depicts a possible data integration use case:

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

In this example, the application receives various customer changes in a flat file, combining potential new customers, customers who have been updated or customers who cancelled. This file is read and loaded in CAS, so is the master CUSTOMER table, coming from a database.
CAS is used to combine those 2 streams, add value to the data, cleanse the new data, identify customers to be inserted, to be updated and to be removed in/from the master database table.
CAS cannot update the database table directly, so the resulting tables are directly saved from CAS to the database staging area.
The final step is to apply the changes to the database table. This can be done using SQL orchestrated from SAS.

In practice…

For steps 1-2-3, a simple and minimal example would be the following SAS code. Of course, you can take advantage of this phase to add any task that SAS offers to transform your data, improve it, enrich it or apply analytics on it. Finally, you will save the transient tables in the database staging area, assuming you have one. This example relies on Oracle.

cas mysession ;

libname casdm cas caslib="dm" ;

/* Load master Oracle table */
proc casutil ;
   load incaslib="dm_oradm" casdata="US_CUSTOMERS" outcaslib="dm" casout="us_customers" ;
quit ;

/* Load delta CSV file */
proc casutil ;
   load incaslib="dm" casdata="US_customers_updates.csv" outcaslib="dm" casout="us_customers_delta" ;
quit ;

/* Identify records to insert, update or delete */
data casdm.us_customers_inserts casdm.us_customers_updates casdm.us_customers_deletes ;
   merge casdm.us_customers(in=a) casdm.us_customers_delta(in=b) ;
   by customerid ;
   if b and not a then output casdm.us_customers_inserts ;
   if a and b and active=1 then output casdm.us_customers_updates ;
   if a and b and active=0 then output casdm.us_customers_deletes ;
   drop active ;
run ;

/* Save the CAS tables to the database */
proc casutil incaslib="dm" outcaslib="dm_oradm" ;
   save casdata="us_customers_inserts" casout="US_CUSTOMERS_INSERTS_STAGE" ;
   save casdata="us_customers_updates" casout="US_CUSTOMERS_UPDATES_STAGE" ;
   save casdata="us_customers_deletes" casout="US_CUSTOMERS_DELETES_STAGE" ;
   list files incaslib="dm_oradm" ;
quit ;

cas mysession terminate ;

For the final step 4, you will use traditional SAS/ACCESS code to update the master table with the staging tables contents.

options dbidirectexec sastrace=",,,d" sastraceloc=saslog ;
libname myora oracle user="myuser" password="XXXXXX" path="//mydb.sas.com:1521/xe" schema="dw" ;

/* DELETE - implicit */
proc sql ;
   delete from myora.us_customers where customerid in (select customerid from myora.us_customers_deletes_stage) ;
quit ;

/* INSERT - implicit */
proc sql ;
   insert into myora.us_customers select * from myora.us_customers_inserts_stage ;
quit ;

/* UPDATE - explicit */
proc sql ;
   connect using myora as ora ;
   execute(
      update us_customers t1
         set ("first_name", "last_name", "company_name", "address", "city", "county",
               "state", "zip", "phone1", "phone2", "email", "web") =
            (select t2."first_name", t2."last_name", t2."company_name", t2."address", t2."city", t2."county", 
               t2."state", t2."zip", t2."phone1", t2."phone2", t2."email", t2."web" from us_customers_updates_stage t2
               where t1."customerid"=t2."customerid")
         where exists(select 1 from us_customers_updates_stage t2 where t1."customerid"=t2."customerid")
   ) by ora ;
   disconnect from ora ;
quit ;

/* Clean Staging Tables */
proc sql ;
   drop table myora.US_CUSTOMERS_INSERTS_STAGE ;
   drop table myora.US_CUSTOMERS_UPDATES_STAGE ;
   drop table myora.US_CUSTOMERS_DELETES_STAGE ;
quit ;

This has the advantages of not moving data through the SAS session. Data is moved in CAS, then saved back in the database from CAS, and the final update is orchestrated from a SAS session using push-down instructions.

There are many variants to achieve the final update, whether you know well the proprietary SQL extensions of the database you are accessing or not. It also depends on the database. In Oracle, I could have used the SQL MERGE statement to streamline the full update into one single operation with one single update table.

Or, you can simply rely on SAS implicit pass-through (check the DBIDIRECTEXEC option) to get it done.

The cherry on the cake

Assuming you saved the 2 programs in SAS Folders, you can easily create 2 jobs and orchestrate them in a job flow in the new “Jobs and Flows” web application unlocked by the “SAS Job Flow Scheduler on SAS Viya” license, so that you can run the final update only if the first phase ran successfully.

You can also use SAS Data Integration Studio to manage the whole process and take advantage of dedicated database table loader transforms to perform the final step.

Thanks for reading.

idziemianczyk · ‎04-26-2021

In the step 4 we have hardcoded SAS9 to Oracle connection. It is possible to avoid it defining the connection using the Environment Manager then reuse it for pass-through queries?

libname myora oracle user="myuser" password="XXXXXX" path="//mydb.sas.com:1521/xe" schema="dw" ;

NicolasRobert · ‎05-10-2021

Hello @idziemianczyk

It should be possible using the AUTHDOMAIN libname option:

78   libname test postgres authdomain="PGAuth" server="myserver" database="sandbox" schema="public"
78 !  preserve_names=yes ;
NOTE:  Credential obtained from Viya credentials service.
NOTE: Libref TEST was successfully assigned as follows: 
      Engine:        POSTGRES 
      Physical Name: myserver

Then you can use the following SQL statement to reuse the library connection:

connect using myora as ora ;

JackHamilton · ‎05-10-2021

I hadn't been aware of the "connect using" statement until recently. It's good to have - I can put all of my database options in one libname statement at the beginning of the program, instead of having to recode them in every PROC SQL step. It seems to be smart enough to know which libname options apply to CONNECT TO (I don't know how you you can find all of the options that have been applied to a database libname or connection, but I haven't noticed anything missing).