This is the third and conclusive post in this series. For an introduction to the different slowly changing dimension types, and the basis for these articles, see my previous articles:
In this third part, we’ll discover how we handle the SCD types 5 through 7. They are all newly defined by Ross & Kimball, but some of the concepts are already familiar to amny of you, as real world situations.
SCD Type 5 is a type 1 dimension with a type 4 “outrigger” – it has its dimensional key as a type 1 FK in hosting dimension. This means that we need to do a look-up operation, in our case using the continuous values to find the combinational match in the type 4 mini-dimension.
As you can see I chose Join for the look-up, since it can handle BETWEEN – AND join criteria:
And in the Type 1 Loader, the mini-dimension surrogate key is handled just like any other attribute when loading the type 5 dimension table.
In this type, we are mixing all three of the original type 1, 2 and 3 types. So it might feel a bit complicated, so we’ll take one step at a time.
For this exercise, we start with a SCD Type 2 Loader, since this is the foundation of the type 6. We use the same business as in corresponding earlier examples. Also, we use a surrogate key.
For the type 1 attributes, we simply select them in the Type 1 Columns tab.
To handle the type 3 columns, we reuse the concept from the type 3 example above.
In the job we first extract the current records in the dimension. Then in a join with the transactions compare if there are any change between the values in the type 1 columns (that have a corresponding type 3 columns). But, we don’t filter out records that haven’t changed. This is because we have other type 2 columns in the table, and we need to forward all source records for the type 2 (and 1) comparison in the SCD Type 2 Loader.
So, at last, the finale. For a type 7 concept, we split up the type 6 dimension into two separate dimensions, one type 2, and one type 1. Even if we don’t have any specific type 3 columns in this concept, this solution delivers the same functionality. Of course, if there is a user requirement, you could introduce type 3 columns in the type 1 dimension (as shown above).
Even if we have to load two dimensions, that process is simpler, by using the standard DI Studio transformations. Since we already created type 1 and type 2 dimensions, I won’t bore you with a reprise. Instead, let us see how we connect our dimensions to the fact table.
For this exercise, I’ve created random sales figures based on the SASHELP.CARS table.
I might have been ungenerous by not supplying any data models, but it’s quite easy to reengineer a fact table load job to a star schema model. It’s obvious which table holds the facts, and which tables are the dimensions. To see details, just click on an object and see columns/mappings in the details pane. For completeness, this is how to set the parameters in the Lookup. The example for the SCD Type 2 dimension:
And then, just map the business keys:
Select the dimension key, and propagate it to the target (if not already defined). The naming is just for clarity in this example. It should probably have a more business friendly name in a real world scenario.
Finally, you need to filter out the appropriate (current) row to get the correct dimension key. If you need to load records for more than one update cycle of the dimension, you can’t use the Lookup transformation, since the filter cannot be data driven. In those situations, you can use SQL join using BETWEEN-AND join conditions.
The type 1 lookup looks the same, but there is no need for filtering (since there is just one row for each business/surrogate key).
I think what I have shown here is that SAS DI Studio is a capable and a flexible tool. It handles these 8 different scenarios with surprisingly little bending, and no user written code. Of course, I have used simple demo data, and we didn’t cover the whole ETL process. But still, I think it shows the power of SAS Data Integration technology.
And as with all type of development, situations can be solved in numerous ways. If you have suggestions for alternative ways to this, I’ll be happy to hear from you!
Ralph Kimball, Margy Ross: The Data Warehouse Toolkit (3rd edition)
For help with contnts, disposistion and language, my collegeues at Infotrek, and