SAS Cloud Analytic Services offers a variety of capabilities regarding database data: loading data from databases using serial, multi-node and parallel methods, saving CAS data back to databases, offloading some SQL queries from CAS using FedSQL, etc.
CAS write-back to database is a great feature but we can extend it with traditional SAS/ACCESS capabilities to make it even better. Indeed, updating an existing database table directly from CAS is not available yet.
However, we have options to circumvent this since SAS Viya not only leverages CAS data connectors capabilities but also traditional SAS/ACCESS features through the use of SAS compute services. So, we can easily combine CAS saving mechanisms with SAS/ACCESS PROC SQL implicit or explicit pass-through. And SAS Job Flow Scheduler makes it easier to orchestrate them.
The following flow depicts a possible data integration use case:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
For steps 1-2-3, a simple and minimal example would be the following SAS code. Of course, you can take advantage of this phase to add any task that SAS offers to transform your data, improve it, enrich it or apply analytics on it. Finally, you will save the transient tables in the database staging area, assuming you have one. This example relies on Oracle.
cas mysession ; libname casdm cas caslib="dm" ; /* Load master Oracle table */ proc casutil ; load incaslib="dm_oradm" casdata="US_CUSTOMERS" outcaslib="dm" casout="us_customers" ; quit ; /* Load delta CSV file */ proc casutil ; load incaslib="dm" casdata="US_customers_updates.csv" outcaslib="dm" casout="us_customers_delta" ; quit ; /* Identify records to insert, update or delete */ data casdm.us_customers_inserts casdm.us_customers_updates casdm.us_customers_deletes ; merge casdm.us_customers(in=a) casdm.us_customers_delta(in=b) ; by customerid ; if b and not a then output casdm.us_customers_inserts ; if a and b and active=1 then output casdm.us_customers_updates ; if a and b and active=0 then output casdm.us_customers_deletes ; drop active ; run ; /* Save the CAS tables to the database */ proc casutil incaslib="dm" outcaslib="dm_oradm" ; save casdata="us_customers_inserts" casout="US_CUSTOMERS_INSERTS_STAGE" ; save casdata="us_customers_updates" casout="US_CUSTOMERS_UPDATES_STAGE" ; save casdata="us_customers_deletes" casout="US_CUSTOMERS_DELETES_STAGE" ; list files incaslib="dm_oradm" ; quit ; cas mysession terminate ;
For the final step 4, you will use traditional SAS/ACCESS code to update the master table with the staging tables contents.
options dbidirectexec sastrace=",,,d" sastraceloc=saslog ; libname myora oracle user="myuser" password="XXXXXX" path="//mydb.sas.com:1521/xe" schema="dw" ; /* DELETE - implicit */ proc sql ; delete from myora.us_customers where customerid in (select customerid from myora.us_customers_deletes_stage) ; quit ; /* INSERT - implicit */ proc sql ; insert into myora.us_customers select * from myora.us_customers_inserts_stage ; quit ; /* UPDATE - explicit */ proc sql ; connect using myora as ora ; execute( update us_customers t1 set ("first_name", "last_name", "company_name", "address", "city", "county", "state", "zip", "phone1", "phone2", "email", "web") = (select t2."first_name", t2."last_name", t2."company_name", t2."address", t2."city", t2."county", t2."state", t2."zip", t2."phone1", t2."phone2", t2."email", t2."web" from us_customers_updates_stage t2 where t1."customerid"=t2."customerid") where exists(select 1 from us_customers_updates_stage t2 where t1."customerid"=t2."customerid") ) by ora ; disconnect from ora ; quit ; /* Clean Staging Tables */ proc sql ; drop table myora.US_CUSTOMERS_INSERTS_STAGE ; drop table myora.US_CUSTOMERS_UPDATES_STAGE ; drop table myora.US_CUSTOMERS_DELETES_STAGE ; quit ;
This has the advantages of not moving data through the SAS session. Data is moved in CAS, then saved back in the database from CAS, and the final update is orchestrated from a SAS session using push-down instructions.
There are many variants to achieve the final update, whether you know well the proprietary SQL extensions of the database you are accessing or not. It also depends on the database. In Oracle, I could have used the SQL MERGE statement to streamline the full update into one single operation with one single update table.
Or, you can simply rely on SAS implicit pass-through (check the DBIDIRECTEXEC option) to get it done.
Assuming you saved the 2 programs in SAS Folders, you can easily create 2 jobs and orchestrate them in a job flow in the new “Jobs and Flows” web application unlocked by the “SAS Job Flow Scheduler on SAS Viya” license, so that you can run the final update only if the first phase ran successfully.
You can also use SAS Data Integration Studio to manage the whole process and take advantage of dedicated database table loader transforms to perform the final step.
Thanks for reading.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.