Hi, I'm looking to conditionally update data in a table based on matching columns/rows. Hopefully examples should make it clearer.
Data Table (Master):
ID | Store_Number | Local_Cust_ID | Global_Cust_ID | Record_Start_Date | Record_End_Date | Is_Current |
1 | 987 | 78 | 1989 | 11APR2017:11:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
2 | 321 | 54 | 1989 | 11APR2017:09:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
3 | 654 | 64 | 4589 | 11APR2017:15:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
4 | 321 | 20 | 5243 | 11APR2017:14:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
5 | 987 | 14 | 6547 | 11APR2017:15:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
I want to be able to feed in say the below table (source)
Store_Number | Local_Cust_ID | Global_Cust_ID |
987 | 78 | 1989 |
654 | 64 | 4589 |
And where all three columns match, then the Record End Date and Is Current for that row will be updated. giving me:
ID | Store_Number | Local_Cust_ID | Global_Cust_ID | Record_Start_Date | Record_End_Date | Is_Current |
1 | 987 | 78 | 1989 | 11APR2017:11:49:46.850 | 25APR2017:18:24:46.850 | 0 |
2 | 321 | 54 | 1989 | 11APR2017:09:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
3 | 654 | 64 | 4589 | 11APR2017:15:49:46.850 | 25APR2017:18:24:46.850 | 0 |
4 | 321 | 20 | 5243 | 11APR2017:14:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
5 | 987 | 14 | 6547 | 11APR2017:15:49:46.850 | 01JAN5999:00:00:00.000 | 1 |
I'd like to do this through transformations rather than code. I've tried the update transformation which is frankly baffling me as it only seems to let you select columns to match on in the output table not the input which doesn't make sense to me (I want to match an input against an output). The Merge transformation looks promising but doesn't work on SAS tables, and for our setup apparently won't work on SQL tables either as we set the libraries up as ODBC.
Surely this is a relatively basic task so I'm baffled why it's so difficult. Essentially this is what I want. When these three conditions are met:
Source.Store_Number = Master.Store_Number
Source.Local_Cust_ID = Master.Local_Cust_ID
Source.Global_Cust_ID = Master.Global_Cust_ID
Then Update Record_End_Date to %SYSFUNC(DATETIME()) and Is_Current to 0 in the Master table
A Type 2 won't work as I won't have changes, I just want the record closing without a new one been made, it also needs to be on all three values as none are unique.
Thanks
MRDM,
The uniqueness [and non-standardness] of the task is that a) as you mentioned, you have no change indicator [which is not typical], and b) you want to do very specific non-SCD type of transformation [close the record, recording the end date as a change to the record, but not keep the history]. This makes the task *slightly* more challenging.
First off, the Update, Merge, and most of the other DI transformations in the SQL group (except Join) are for in-database work and only with certain RDBMS (and, as you noted, not ODBC). The proper SQL code to perform this task would be an UPDATE, but the only transformation we have to work with here is Join -- so we would end up hand-coding the solution (which you didn't want to do).
It could also be done in SAS Data Step fairly easily, but again this requires hand-coding.
So we have to convince DI Studio of the right thing to do. Here's how I accomplished it [I've attached an image of the flow].
- I added a SQL Join {xform #1} to simply add a column named CLOSE to the SOURCE table giving it the value 1.
- I added a SQL Join {xform #2} to join (a LEFT join of MASTER and SOURCE) ensuring that CLOSE flows through on the matching records. This effectively tags MASTER records as needed to be closed.
- I added a (very powerful!) Data Validation {xform #3} node and created a Custom Validation with:
condition: close=1
if condition is true:
is_current=0
record_end_date=datetime()
This does the transformations as you need them.
- I added a Table Loader {xform #4} writing out to the Master_Updated table (you could overwrite the MASTER if desired) so that I can drop the CLOSE column.
I'm sure there are other ways to accomplish this as well. But this *was* accomplished with only point-and-click and no hand-coding. Thats the nice thing about DI Studio, it usually gives you many different ways tasks can be accomplished.
Were you using DB2/Oracle/SQL Server/Sybase/Teradata, the Update transformation would (by far) be the correct way to go.
Steve
I believe at least with a recent version of DIS the SQL UPDATE transformation should allow you to do what you're after.
What I'm normally doing when I'm not sure how to set-up stuff in DIS: I first figure out how the generated code should look like and then try and click it together in DIS while checking in the generated code if I'm getting what I want.
For your use case the code I'd go for is a SQL Update with an EXIST clause. Here the test code:
data master;
input ID Store_Number Local_Cust_ID Global_Cust_ID (Record_Start_Date Record_End_Date) (:datetime.) Is_Current;
format Record_Start_Date Record_End_Date datetime21.;
datalines;
1 987 78 1989 11APR2017:11:49:46.850 01JAN5999:00:00:00.000 1
2 321 54 1989 11APR2017:09:49:46.850 01JAN5999:00:00:00.000 1
;
run;
data trans;
input Store_Number Local_Cust_ID Global_Cust_ID;
datalines;
987 78 1989
654 64 4589
;
run;
proc sql;
update master as m
set
Record_End_Date="%sysfunc(datetime(),datetime20.)"dt,
Is_Current=0
where exists
(
select * from trans as t
where m.Store_Number=t.Store_Number and
m.Local_Cust_ID=t.Local_Cust_ID and
m.Global_Cust_ID=t.Global_Cust_ID
)
;
quit;
Using DIS 4.901 I was able to generate the same code as follows:
1. Job Flow
2. SQL Update transformation:
3. Subquery (highlighted above):
Generated code:
Let me know if that worked for you.
P.S: If you're using an ODBC connection then - depending where your data reside (SAS or database) - it might be worth to also play around with all the ODBC settings and pass-through SQL.
If you go for pass-through SQL then make sure that passing in the datetime works. You need eventually to use a database specif function to set the current date - just check the generated code and if the %sysfunc(datetime()...) bit appears in the pass-through SQL block then replace this in the expression with a database function which returns the current datetime.
Good find Patrick!
I didn't realize this functionality had been changed. It looks like the DI Studio help doc needs to be changed to match this functionality. I'll be sure to let them know.
Steve
Oh, and I just checked. You could also use the Table Loader transformation for this task - but it generates less efficient code than the SQL Update.
Below worked for me:
And here the desired and correct result (once I've run a valid test....)
Thanks for the responses and apologies for the delay, so far I've tried the update method (as it was partly setup) and it has worked perfectly. It seems I was missing a few elements (I'd only linked my transaction table). I'd also never used subqueries before, having normally just done advanced and made my own so I've learnt quite a few new things which is great.
I'm hoping that over the next few days I'll get chance to go through and try the rest of the methods in this thread just for a learning perspective so I'm aware of all the different routes I could have gone.
Cheers.
You've got subqueries also in the SQL Join transformation. Having said that I'm not a great fan of how the DIS windows behave for sub-query definitions. I'd like to get something like a pop-up window or a split screen for sub-queries so I can keep a graphical overview over the whole SQL while working on a sub-query.
I consider SQL's with sub-queries also much harder to debug so I normally try to keep my SQLs as simple as possible.
Given that SQL views are in a way nothing else than encapsulated SQL code what I'm normally doing if possible, is to have multiple nodes and to define the "sub-queries" upfront in their own SQL nodes creating a view - and then use these views in the downstream "master" query. This allows me to unit test and debug these "sub-queries" separately.
This is of course not always possible (ie. for an EXIST clause) and if interfacing with a database you want also to make sure that the SAS/Access engine doesn't get confused and is still able to send as much of the full SQL as possible to the database (use options sastrace=',,,d' sastraceloc=saslog nostsuffix; in such a case to verify in the log what part of the SQL gets actually sent to the database for in-database processing).
If going for this "upfront view" approach also use the FEEDBACK option for your main query.
proc sql;
create view v_class as
select *
from sashelp.class
;
quit;
proc sql feedback;
create table test as
select a.*
from
sashelp.class as a
inner join
v_class as b
on a.name=b.name
;
quit;
Using the FEEDBACK option you then see in the log what SAS actually makes out of your code.
NOTE: Statement transforms to: select A.Name, A.Sex, A.Age, A.Height, A.Weight from SASHELP.CLASS A inner join ( select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height, CLASS.Weight from SASHELP.CLASS ) on A.Name = CLASS.Name;
And as DIS flow above would look like:
And you can set the FEEDBACK option here (as part of SQL Join node "Join with Class"):
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.