BookmarkSubscribeRSS Feed

SAS Data Integration Studio Capability Test: SCD Types 0 through 7, pt 3

Started ‎04-12-2015 by
Modified ‎10-05-2015 by
Views 2,774

SCDType0-7 3 slim.jpg

 

Introduction

This is the third and conclusive post in this series. For an introduction to the different slowly changing dimension types, and the basis for these articles, see my previous articles:

 

In this third part, we’ll discover how we handle the SCD types 5 through 7. They are all newly defined by Ross & Kimball, but some of the concepts are already familiar to amny of you, as real world situations.

 

SAS DI Studio and SCD Types 5-7

 

SCD Type 5

SCD Type 5 is a type 1 dimension with a type 4 “outrigger” – it has its dimensional key as a type 1 FK in hosting dimension. This means that we need to do a look-up operation, in our case using the continuous values to find the combinational match in the type 4 mini-dimension.

 

Type5_job.JPG

As you can see I chose Join for the look-up, since it can handle BETWEEN – AND join criteria:

Type5_joinCriteria.JPG

And in the Type 1 Loader, the mini-dimension surrogate key is handled just like any other attribute when loading the type 5 dimension table.

 

SCD Type 6

In this type, we are mixing all three of the original type 1, 2 and 3 types. So it might feel a bit complicated, so we’ll take one step at a time.

  • The basis is a type 2 dimension, using business key, dimension surrogate key, and valid_from/to_dttm for change history of records.
  • Then we have one or more type 1 columns: if a change is spotted in the source for a business key, the all column values for that business key, including historical records.
  • To that, we add a type 3 column, which holds the previous value for another attribute.

For this exercise, we start with a SCD Type 2 Loader, since this is the foundation of the type 6. We use the same business as in corresponding earlier examples. Also, we use a surrogate key.

 

For the type 1 attributes, we simply select them in the Type 1 Columns tab.

Type6_type1cols.JPG

To handle the type 3 columns, we reuse the concept from the type 3 example above.

Type6_job.JPG

In the job we first extract the current records in the dimension. Then in a join with the transactions compare if there are any change between the values in the type 1 columns (that have a corresponding type 3 columns). But, we don’t filter out records that haven’t changed. This is because we have other type 2 columns in the table, and we need to forward all source records for the type 2 (and 1) comparison in the SCD Type 2 Loader.

                                     

SCD Type 7

So, at last, the finale. For a type 7 concept, we split up the type 6 dimension into two separate dimensions, one type 2, and one type 1. Even if we don’t have any specific type 3 columns in this concept, this solution delivers the same functionality. Of course, if there is a user requirement, you could introduce type 3 columns in the type 1 dimension (as shown above).

 

Even if we have to load two dimensions, that process is simpler, by using the standard DI Studio transformations. Since we already created type 1 and type 2 dimensions, I won’t bore you with a reprise. Instead, let us see how we connect our dimensions to the fact table.

 

For this exercise, I’ve created random sales figures based on the SASHELP.CARS table.

Type7_job.JPG

I might have been ungenerous by not supplying any data models, but it’s quite easy to reengineer a fact table load job to a star schema model. It’s obvious which table holds the facts, and which tables are the dimensions. To see details, just click on an object and see columns/mappings in the details pane. For completeness, this is how to set the parameters in the Lookup. The example for the SCD Type 2 dimension:

Type7_Lookup_typ2mapsource.JPG

And then, just map the business keys:

Type7_Lookup_typ2maptarget.JPG

Select the dimension key, and propagate it to the target (if not already defined). The naming is just for clarity in this example. It should probably have a more business friendly name in a real world scenario.


Type7_Lookup_type2_where.JPG

Finally, you need to filter out the appropriate (current) row to get the correct dimension key. If you need to load records for more than one update cycle of the dimension, you can’t use the Lookup transformation, since the filter cannot be data driven. In those situations, you can use SQL join using BETWEEN-AND join conditions.

The type 1 lookup looks the same, but there is no need for filtering (since there is just one row for each business/surrogate key).

 

Conclusion

I think what I have shown here is that SAS DI Studio is a capable and a flexible tool. It handles these 8 different scenarios with surprisingly little bending, and no user written code. Of course, I have used simple demo data, and we didn’t cover the whole ETL process. But still, I think it shows the power of SAS Data Integration technology.

And as with all type of development, situations can be solved in numerous ways. If you have suggestions for alternative ways to this, I’ll be happy to hear from you!

 

References

Margy Ross: Design Tip #152 Slowly Changing Dimension Types 0, 4, 5, 6 and 7

Ralph Kimball, Margy Ross: The Data Warehouse Toolkit (3rd edition)

 

Acknowledgements

For help with contnts, disposistion and language, my collegeues at Infotrek, and

 

Version history
Last update:
‎10-05-2015 03:33 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels