I seem to be having difficulty getting this SCD Type 2 Transformation to do what I think it should. I have two questions.
1) We use Beginning Date and End Date in our target fact table for purposes of historical record. In the source file, we have a new Begin Date, so I want to close out the current End Date on the target and have a new row that opens with this new Begin Date.
2) We have a Business Key (Student_ID) and a Surrogate Key (STD_KEY). On these new records from the source, it is the Student_ID that has new values, so I would think that is the column I need to check for changes on. However, it is the Business Key. I want to retain the exisiting Surrogate Key.
Ultimately where we have:
STD_KEY 02
STUDENT_ID 1234
BEGIN_DATE 01JAN2006
END_DATE 01JAN5999
I want the new source data to generate a second row for this Surrogate Key with the new Student ID value with both rows looking like:
STD_KEY 02
STUDENT_ID 1234
BEGIN_DATE 01JAN2006
END_DATE 14JAN2008
STUD_KEY 02
STUDENT_ID 5678
BEGIN_DATE 15JAN2008
END_DATE 01JAN5999
What I'm getting is the following that will come across as two totally unrelated records on our target table:
STD_KEY 02
STUDENT_ID 1234
BEGIN_DATE 01JAN2006
END_DATE 01JAN5999
STUD_KEY 03
STUDENT_ID 5678
BEGIN_DATE 15JAN2008
END_DATE 01JAN5999
This seems like it should be very easy to do. What I'm doing though is adding more rows with new Surrogate Keys to my target table and I'm not closing out any dates.
Thoughts?
Ok, I'm carrying the new STUDENT_ID and BEGIN_DATE from source to target through SCD TYPE 2 node mappings so now getting:
STD_KEY 02
STUDENT_ID 1234
BEGIN_DATE 01JAN2006
END_DATE 01JAN5999
STUD_KEY 02
STUDENT_ID 5678
BEGIN_DATE 15JAN2008
END_DATE 01JAN5999
Which is a lot closer, but the dates aren't closing out. And really, this doesn't seem to feel more like any actual change capture but more like an 'append'.
Suggestions?
Alright...I think I've got it working correctly. This is how I did it..and if this isn't correct, do tell.
Tabs:
Change Tracking - using BEGIN_DATE and END_DATE in the column name. BEGIN_DATE and '01JAN5999'D in the EXPRESSION
Business Key - STD_KEY
Generated Key - none
Detect Changes - selected columns is blank, so all
Mappings - mapped STD_KEY, STUDENT_ID, and BEGIN_DATE. _not_ END_DATE as that is generated in the exrpession of the change tracking tab.
So, it looks like it appends the new source records to the target table, but closes out the END_DATE of exisiting records and uses the day before the mapped BEGIN_DATE as the new END_DATE of the exisiting records.
This is the result I want, but am I using the transformation correctly?
Thanks!
Ha! You are correct. This is not a FACT, but a DIM table.
I was confused in my initial post as I was attempting to use this transformation based on readings I was doing from a SAS DI Studio training book. In their example they load a FACT table.
Now knowing this is a DIM table, does the use of the transformation look correct?
Thanks!
The surrogate key is generated by SAS. I have retained it from an earlier lookup.
The business key are ssns and should change, correct. however, the source data we're receiving are 'fixes' or corrected data that was sent to us. Basically, the ssns (the business key) was sent incorrectly to us in our initial load.
The hope is that this SCD Type 2 transformation will close out the old ssns and open records that have the new ones as the business key.
There is other data in these records (date of births, etc) but we are only looking to update the ssns at this time.
Thanks!
I understand.
However, if the business key were to change, would we not still need to open and close the Begin and End Dates so our programs and jobs can find the 'current' records?
What is a more standard way to updated business keys? I think I'm pretty familiar with the DI Studio so am open to trying another approach.
Thanks!
Ok, it's Christmas soon, so here goes (even if DI design it's bit abstract to discuss in detail as of my earlier post).
A complete data (warehouse) architecture would include staging area, detail data store and data/information marts.
Star schemas are a preferred design for data marts. Some think tey're good for the detail data store, but I don't agree.
The detail data store can logically be separated by a core area, where only hard rules have been applied (data types, naming standard etc). And a business area where business rules have been applied. I would say normalizing business keys would be stored/handled in that area.
I assume that you get a separate file/delivery with conversion/matching business keys, separately from the normal attributes.
Surrogate keys created in the detail data store should be inherited to the data marts. Two cases:
So what about your changing business keys? If you have a detail data store, you should have built separate surrogate key tables. You could build them using the business key as PK, have your regular flow create new entries when new business keys arrive. And then have an alternate flow for your old-new/wrong-correct business keys pairs update that surrogate table in separate job.
Another option is to look an be inspired by data vault modelling. There you should use a surrogate key table as well, here called "Hub". Then you have a relational table connect both ways to that Hub, as so called Same-as-Link. Google for examples if that sounds interesting.
Cheers
Linus
The business key identifies a record. That's what you use from source to join data with the target and then to identify if any of the other variables under change tracking have changed.
Because the business key is used for joining you can't have it under change tracking; certainly not with the OOTB DIS SCD2 loader.
Now for your once off job and if the business key is changing (but is actually identifying "the same record"): Which other variable could you use to uniquely identify the record? Is there some alternate key?
If there is an alternate key then you could could use this alternate key as business key and the original business key as a variable under change tracking. This way you could set-up a once off job and load your data using SCD2; and records where the original business key changes would get expired and a new record would get created.
As for @LinusHrecommendation to never use retained keys: I can't really agree with this. Using retained keys makes maintaining bridge tables so much easier. With surrogate keys you need to create new relationship records in bridge tables whenever something in the primary tables changes; even if there is no change in the relationship itself. This cost me a lot of pain and a lot of additional processing in past projects.
Ok, "never" is a quite strong word...
But, I always try to avoid building bridge tables in the access/data mart layer - they are simply too complicated to use for most users. This can be done by having a structured detail data store underneath the mart layer, and by creating denormalized information marts.
Type 2 dimensions with a normal 1-M relationship to the fact table is a much more common scenario (in my designs at least). So I wish to simplify join conditions for the users, hence the no retained keys policy.
Thanks, LinusH and Patrick.
You both have given me plenty to investigate from here. Also, from your explanations, I think I may have mis-used terminology in reference to the Business Key.
Thanks again for your detailed responses and have a great holiday!
-Jeff
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.