SCD Type 2 - DI - Beginning Date and Business Key Questions

jwhite · Posted 12-17-2015 10:53 AM

I seem to be having difficulty getting this SCD Type 2 Transformation to do what I think it should. I have two questions.

1) We use Beginning Date and End Date in our target fact table for purposes of historical record. In the source file, we have a new Begin Date, so I want to close out the current End Date on the target and have a new row that opens with this new Begin Date.

2) We have a Business Key (Student_ID) and a Surrogate Key (STD_KEY). On these new records from the source, it is the Student_ID that has new values, so I would think that is the column I need to check for changes on. However, it is the Business Key. I want to retain the exisiting Surrogate Key.

Ultimately where we have:

STD_KEY 02

STUDENT_ID 1234

BEGIN_DATE 01JAN2006

END_DATE 01JAN5999

I want the new source data to generate a second row for this Surrogate Key with the new Student ID value with both rows looking like:

STD_KEY 02

STUDENT_ID 1234

BEGIN_DATE 01JAN2006

END_DATE 14JAN2008

STUD_KEY 02

STUDENT_ID 5678

BEGIN_DATE 15JAN2008

END_DATE 01JAN5999

What I'm getting is the following that will come across as two totally unrelated records on our target table:

STD_KEY 02

STUDENT_ID 1234

BEGIN_DATE 01JAN2006

END_DATE 01JAN5999

STUD_KEY 03

STUDENT_ID 5678

BEGIN_DATE 15JAN2008

END_DATE 01JAN5999

This seems like it should be very easy to do. What I'm doing though is adding more rows with new Surrogate Keys to my target table and I'm not closing out any dates.

Thoughts?

jwhite · Posted 12-17-2015 11:44 AM

Ok, I'm carrying the new STUDENT_ID and BEGIN_DATE from source to target through SCD TYPE 2 node mappings so now getting:

STD_KEY 02

STUDENT_ID 1234

BEGIN_DATE 01JAN2006

END_DATE 01JAN5999

STUD_KEY 02

STUDENT_ID 5678

BEGIN_DATE 15JAN2008

END_DATE 01JAN5999

Which is a lot closer, but the dates aren't closing out. And really, this doesn't seem to feel more like any actual change capture but more like an 'append'.

Suggestions?

jwhite · Posted 12-17-2015 12:25 PM

Alright...I think I've got it working correctly. This is how I did it..and if this isn't correct, do tell.

Tabs:

Change Tracking - using BEGIN_DATE and END_DATE in the column name. BEGIN_DATE and '01JAN5999'D in the EXPRESSION

Business Key - STD_KEY

Generated Key - none

Detect Changes - selected columns is blank, so all

Mappings - mapped STD_KEY, STUDENT_ID, and BEGIN_DATE. _not_ END_DATE as that is generated in the exrpession of the change tracking tab.

So, it looks like it appends the new source records to the target table, but closes out the END_DATE of exisiting records and uses the day before the mapped BEGIN_DATE as the new END_DATE of the exisiting records.

This is the result I want, but am I using the transformation correctly?

Thanks!

LinusH · Posted 12-17-2015 03:27 PM

First I'm not sure if you are using the correct terminology, or if your dimensional model is "correct".
Using type 2 on a fact is rare but can in some cases be valid use case. Like if you wish to save space on rarely changed balance values. But need to see more attributes to understand.

If this is a fact, you should only bring the dimension key ( which you got from a lookup earlier in the job).

But if you are in fact 😛 loading a dimension your I'd is the business key, and std_key is your retained surrogate key.

Data never sleeps

jwhite · Posted 12-17-2015 03:33 PM

Ha! You are correct. This is not a FACT, but a DIM table.

I was confused in my initial post as I was attempting to use this transformation based on readings I was doing from a SAS DI Studio training book. In their example they load a FACT table.

Now knowing this is a DIM table, does the use of the transformation look correct?

Thanks!

LinusH · Posted 12-17-2015 04:33 PM

Is your data "real" or sample data?
I'm confused about the origin of your surrogate key. Is it delivered from the source or maintained by SAS?
And it sounds odd if you identified a business key, and that key can change. Doesn't sound right... The idea with a business key is that it doesn't change.

Data never sleeps

jwhite · Posted 12-17-2015 04:50 PM

The surrogate key is generated by SAS. I have retained it from an earlier lookup.

The business key are ssns and should change, correct. however, the source data we're receiving are 'fixes' or corrected data that was sent to us. Basically, the ssns (the business key) was sent incorrectly to us in our initial load.

The hope is that this SCD Type 2 transformation will close out the old ssns and open records that have the new ones as the business key.

There is other data in these records (date of births, etc) but we are only looking to update the ssns at this time.

Thanks!

LinusH · Posted 12-18-2015 02:20 AM

Ok it seems that you don't have a standardised flow here. To be able to give advice I think one needs to review the whole ETL flow and design, something that it's not optimal to do here. I sugest that you find someone that can consult your design in more detail.

Usually, type 2 dimensions holds the logic for key management. And corrections to business-DW key pairs is managed in separate jobs/flows.

Data never sleeps

jwhite · Posted 12-18-2015 09:10 AM

I understand.

However, if the business key were to change, would we not still need to open and close the Begin and End Dates so our programs and jobs can find the 'current' records?

What is a more standard way to updated business keys? I think I'm pretty familiar with the DI Studio so am open to trying another approach.

Thanks!

LinusH · Posted 12-19-2015 11:01 AM

Ok, it's Christmas soon, so here goes (even if DI design it's bit abstract to discuss in detail as of my earlier post).

A complete data (warehouse) architecture would include staging area, detail data store and data/information marts.

Star schemas are a preferred design for data marts. Some think tey're good for the detail data store, but I don't agree.

The detail data store can logically be separated by a core area, where only hard rules have been applied (data types, naming standard etc). And a business area where business rules have been applied. I would say normalizing business keys would be stored/handled in that area.

I assume that you get a separate file/delivery with conversion/matching business keys, separately from the normal attributes.

Surrogate keys created in the detail data store should be inherited to the data marts. Two cases:

Dimension type 2: don't use retained key, since that adds complexity in the join conditions to fact tables. Use the detail data store surrogate key as business
DImension type 1: if you have the same granularity as in the detail data store, you could use the same key (and again as business key). No need to create a new key in the data mart.

So what about your changing business keys? If you have a detail data store, you should have built separate surrogate key tables. You could build them using the business key as PK, have your regular flow create new entries when new business keys arrive. And then have an alternate flow for your old-new/wrong-correct business keys pairs update that surrogate table in separate job.

Another option is to look an be inspired by data vault modelling. There you should use a surrogate key table as well, here called "Hub". Then you have a relational table connect both ways to that Hub, as so called Same-as-Link. Google for examples if that sounds interesting.

Cheers

Linus

Data never sleeps

Patrick · Posted 12-21-2015 03:29 AM

The business key identifies a record. That's what you use from source to join data with the target and then to identify if any of the other variables under change tracking have changed.

Because the business key is used for joining you can't have it under change tracking; certainly not with the OOTB DIS SCD2 loader.

Now for your once off job and if the business key is changing (but is actually identifying "the same record"): Which other variable could you use to uniquely identify the record? Is there some alternate key?

If there is an alternate key then you could could use this alternate key as business key and the original business key as a variable under change tracking. This way you could set-up a once off job and load your data using SCD2; and records where the original business key changes would get expired and a new record would get created.

As for @LinusHrecommendation to never use retained keys: I can't really agree with this. Using retained keys makes maintaining bridge tables so much easier. With surrogate keys you need to create new relationship records in bridge tables whenever something in the primary tables changes; even if there is no change in the relationship itself. This cost me a lot of pain and a lot of additional processing in past projects.

LinusH · Posted 12-21-2015 04:07 AM

Ok, "never" is a quite strong word...

But, I always try to avoid building bridge tables in the access/data mart layer - they are simply too complicated to use for most users. This can be done by having a structured detail data store underneath the mart layer, and by creating denormalized information marts.

Type 2 dimensions with a normal 1-M relationship to the fact table is a much more common scenario (in my designs at least). So I wish to simplify join conditions for the users, hence the no retained keys policy.

Data never sleeps

jwhite · Posted 12-21-2015 12:25 PM

Thanks, LinusH and Patrick.

You both have given me plenty to investigate from here. Also, from your explanations, I think I may have mis-used terminology in reference to the Business Key.

Thanks again for your detailed responses and have a great holiday!

-Jeff

SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Re: SCD Type 2 - DI - Beginning Date and Business Key Questions

Ready to join fellow brilliant minds for the SAS Hackathon?