BookmarkSubscribeRSS Feed
MRDM
Obsidian | Level 7

Hi, I'm looking to conditionally update data in a table based on matching columns/rows. Hopefully examples should make it clearer.

 

Data Table (Master):

 

IDStore_NumberLocal_Cust_IDGlobal_Cust_IDRecord_Start_DateRecord_End_DateIs_Current
198778198911APR2017:11:49:46.85001JAN5999:00:00:00.0001
232154198911APR2017:09:49:46.85001JAN5999:00:00:00.0001
365464458911APR2017:15:49:46.85001JAN5999:00:00:00.0001
432120524311APR2017:14:49:46.85001JAN5999:00:00:00.0001
598714654711APR2017:15:49:46.85001JAN5999:00:00:00.0001

 

 

I want to be able to feed in say the below table (source)

 

Store_NumberLocal_Cust_ID Global_Cust_ID 
987781989
654644589

 

And where all three columns match, then the Record End Date and Is Current for that row will be updated. giving me:

 

IDStore_NumberLocal_Cust_IDGlobal_Cust_IDRecord_Start_DateRecord_End_DateIs_Current
198778198911APR2017:11:49:46.85025APR2017:18:24:46.8500
232154198911APR2017:09:49:46.85001JAN5999:00:00:00.0001
365464458911APR2017:15:49:46.85025APR2017:18:24:46.8500
432120524311APR2017:14:49:46.85001JAN5999:00:00:00.0001
598714654711APR2017:15:49:46.85001JAN5999:00:00:00.0001

 

 

I'd like to do this through transformations rather than code. I've tried the update transformation which is frankly baffling me as it only seems to let you select columns to match on in the output table not the input which doesn't make sense to me (I want to match an input against an output). The Merge transformation looks promising but doesn't work on SAS tables, and for our setup apparently won't work on SQL tables either as we set the libraries up as ODBC.

 

Surely this is a relatively basic task so I'm baffled why it's so difficult. Essentially this is what I want. When these three conditions are met:

Source.Store_Number = Master.Store_Number

Source.Local_Cust_ID = Master.Local_Cust_ID

Source.Global_Cust_ID = Master.Global_Cust_ID

 

Then Update Record_End_Date to %SYSFUNC(DATETIME()) and Is_Current to 0 in the Master table

 

A Type 2 won't work as I won't have changes, I just want the record closing without a new one been made, it also needs to be on all three values as none are unique.

 

Thanks

6 REPLIES 6
SteveED
SAS Employee

MRDM,

 

 The uniqueness [and non-standardness] of the task is that a) as you mentioned, you have no change indicator [which is not typical], and b) you want to do very specific non-SCD type of transformation [close the record, recording the end date as a change to the record, but not keep the history]. This makes the task *slightly* more challenging.

 

 First off, the Update, Merge, and most of the other DI transformations in the SQL group (except Join) are for in-database work and only with certain RDBMS (and, as you noted, not ODBC). The proper SQL code to perform this task would be an UPDATE, but the only transformation we have to work with here is Join -- so we would end up hand-coding the solution (which you didn't want to do). 

 

 It could also be done in SAS Data Step fairly easily, but again this requires hand-coding. 

 

 So we have to convince DI Studio of the right thing to do. Here's how I accomplished it [I've attached an image of the flow].

- I added a SQL Join {xform #1} to simply add a column named CLOSE to the SOURCE table giving it the value 1.

- I added a SQL Join {xform #2} to join (a LEFT join of MASTER and SOURCE) ensuring that CLOSE flows through on the matching records. This effectively tags MASTER records as needed to be closed.

- I added a (very powerful!) Data Validation {xform #3} node and created a Custom Validation with:

             condition: close=1

             if condition is true:

                   is_current=0

                   record_end_date=datetime()

  This does the transformations as you need them.

- I added a Table Loader {xform #4} writing out to the Master_Updated table (you could overwrite the MASTER if desired) so that I can drop the CLOSE column.

 

I'm sure there are other ways to accomplish this as well. But this *was* accomplished with only point-and-click and no hand-coding.  Thats the nice thing about DI Studio, it usually gives you many different ways tasks can be accomplished.  

 

Were you using DB2/Oracle/SQL Server/Sybase/Teradata, the Update transformation would (by far) be the correct way to go.

 

Steve


DI record close out.JPG
Patrick
Opal | Level 21

@MRDM

I believe at least with a recent version of DIS the SQL UPDATE transformation should allow you to do what you're after.

 

What I'm normally doing when I'm not sure how to set-up stuff in DIS: I first figure out how the generated code should look like and then try and click it together in DIS while checking in the generated code if I'm getting what I want.

 

For your use case the code I'd go for is a SQL Update with an EXIST clause. Here the test code:

data master;
  input ID Store_Number Local_Cust_ID Global_Cust_ID (Record_Start_Date Record_End_Date) (:datetime.) Is_Current;
  format Record_Start_Date  Record_End_Date datetime21.;
  datalines;
1 987 78 1989 11APR2017:11:49:46.850 01JAN5999:00:00:00.000 1
2 321 54 1989 11APR2017:09:49:46.850 01JAN5999:00:00:00.000 1
;
run;

data trans;
  input Store_Number Local_Cust_ID Global_Cust_ID;
  datalines;
987 78 1989
654 64 4589
;
run;

proc sql;
  update master as m
    set 
      Record_End_Date="%sysfunc(datetime(),datetime20.)"dt,
      Is_Current=0
    where exists
      (
        select * from trans as t
          where m.Store_Number=t.Store_Number and
                m.Local_Cust_ID=t.Local_Cust_ID and
                m.Global_Cust_ID=t.Global_Cust_ID
      )
  ;
quit;
 

Using DIS 4.901 I was able to generate the same code as follows:

1. Job Flow

Capture.PNG

 

2. SQL Update transformation:

Capture.PNG

 

3. Subquery (highlighted above):

Capture.PNG

Capture.PNG

Capture.PNG

 

Generated code:

Capture.PNG

 

Let me know if that worked for you.

 

P.S: If you're using an ODBC connection then - depending where your data reside (SAS or database) - it might be worth to also play around with all the ODBC settings and pass-through SQL.

If you go for pass-through SQL then make sure that passing in the datetime works. You need eventually to use a database specif function to set the current date - just check the generated code and if the %sysfunc(datetime()...) bit appears in the pass-through SQL block then replace this in the expression with a database function which returns the current datetime. 

SteveED
SAS Employee

Good find Patrick!

 

I didn't realize this functionality had been changed. It looks like the DI Studio help doc needs to be changed to match this functionality. I'll be sure to let them know.

 

Steve

Patrick
Opal | Level 21

@MRDM

Oh, and I just checked. You could also use the Table Loader transformation for this task - but it generates less efficient code than the SQL Update.

 

Below worked for me:

 

Capture.PNG

 

Capture.PNG

 

Capture.PNG

 

 

And here the desired and correct result (once I've run a valid test....)

 

Capture.PNG

MRDM
Obsidian | Level 7

Hi @SteveED @Patrick

 

Thanks for the responses and apologies for the delay, so far I've tried the update method (as it was partly setup) and it has worked perfectly. It seems I was missing a few elements (I'd only linked my transaction table). I'd also never used subqueries before, having normally just done advanced and made my own so I've learnt quite a few new things which is great.

 

I'm hoping that over the next few days I'll get chance to go through and try the rest of the methods in this thread just for a learning perspective so I'm aware of all the different routes I could have gone.

 

Cheers.

Patrick
Opal | Level 21

@MRDM

You've got subqueries also in the SQL Join transformation. Having said that I'm not a great fan of how the DIS windows behave for sub-query definitions. I'd like to get something like a pop-up window or a split screen for sub-queries so I can keep a graphical overview over the whole SQL while working on a sub-query.

I consider SQL's with sub-queries also much harder to debug so I normally try to keep my SQLs as simple as possible.

 

Given that SQL views are in a way nothing else than encapsulated SQL code what I'm normally doing if possible, is to have multiple nodes and to define the "sub-queries" upfront in their own SQL nodes creating a view - and then use these views in the downstream "master" query. This allows me to unit test and debug these "sub-queries" separately.

This is of course not always possible (ie. for an EXIST clause) and if interfacing with a database you want also to make sure that the SAS/Access engine doesn't get confused and is still able to send as much of the full SQL as possible to the database (use options sastrace=',,,d' sastraceloc=saslog nostsuffix; in such a case to verify in the log what part of the SQL gets actually sent to the database for in-database processing).

 

If going for this "upfront view" approach also use the FEEDBACK option for your main query.

 

proc sql;
  create view v_class as
    select *
    from sashelp.class
    ;
quit;

proc sql feedback;
  create table test as
    select a.*
      from 
        sashelp.class as a
          inner join
        v_class as b
        on a.name=b.name
    ;
quit;

 

Using the FEEDBACK option you then see in the log what SAS actually makes out of your code.

NOTE: Statement transforms to:

        select A.Name, A.Sex, A.Age, A.Height, A.Weight
          from SASHELP.CLASS A inner join 
               ( select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height, CLASS.Weight
                   from SASHELP.CLASS
               ) on A.Name = CLASS.Name;

 

And as DIS flow above would look like:

Capture.PNG

 

And you can set the FEEDBACK option here (as part of SQL Join node "Join with Class"):

Capture.PNG

 

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5424 views
  • 8 likes
  • 3 in conversation