Most people who are familiar with data warehouse concepts knows about the concept of Slowly Changing Dimensions. It’s been a part of the standard toolbox for data warehouse implementations since Ralph Kimball published “The Data Warehouse Toolkit” in the late 90’s. But with time, it came clear that not all business cases could be solved by the original SCD Types (1-3). New techniques have been formulated. Some based on combination of the existing ones, others are new. In 2013, the Kimball Group formulated a new set of SCD Types, now ranging from 0 to 7.
The focus for this article is to use these different SCD types as different loading scenarios. And use them as base for testing how SAS DI Studio can adopt these different techniques. In this article we focus on loading dimensions specifically, even if some of the SCD types can be adopted in other types of tables in a data warehouse. For as more comprehensive description of the different SCD types, see Margy Ross’ article.
We use surrogate keys in each example in our dimensional tables. It not always ideal to use them, but as soon as you have a composite business key, a surrogate key makes sense. If you would like to use hash keys instead, use my article Using SAS DI Studio to Load a Data Vault, Part II (DV2.0) as a starting point.
In this post I’ll setting up the stage with the different SCD types, and will exemplify the first two (types 0 and 1). DI Studio examples for the remaining SCD types will be covered in future posts.
A short summary of the different types. They will be described a bit more under each section.
0. Retain Original. Once written, never change. Time dimensions attributes are typical Type 0. Other uses could be “Original Sales Channel”.
In SAS DI Studio, there are two transformations specifically created for this purpose, the SCD Type 1 and 2 Loaders. Let us now see how we can use them, and other transformations to fulfill each SCD Type. Let me be clear that each scenario is not meant to be fully logical, or common in the real world, they are solely used to demonstrate DI Studio capability.
“Retain Original” – sounds easy. What I will use here is the Surrogate Key Generator transformation.
It uses as input along with the source table, the dimension table (which is unfortunately not shown in the job editor). Now we are only interested to fetch new records (since we will never overwrite). We can use a column in the target table (which should be defined as NOT NULL), which is not present in the source data. This to figure out if the records is coming from the source alone. And then a table loader that uses the Append technique – which is fine since we already have matched and filtered out existing rows in the previous steps.
Overwrite with the current values. Easy, especially when there is a ready-made transformation for this– the SCD Type 1 Loader.
This one is of course part of the DI Studio documentation, so I won’t go into detail here. Just want to mention that it generates a surrogate key for you, if you choose.
This was all that fitted into this first part. Watch the community for details on the other SCD Types in future posts. Until then, let me hear any comments from you about this first post!