We’re smarter together. Learn from this collection of community knowledge and add your expertise.

SAS Data Integration Studio Capability Test: SCD Types 0 through 7, pt 1

by Super User on ‎03-07-2015 04:57 AM - edited on ‎10-05-2015 03:42 PM by Community Manager (1,593 Views)

SCDType0-7 slim.jpg

 

Introduction

Most people who are familiar with data warehouse concepts knows about the concept of Slowly Changing Dimensions. It’s been a part of the standard toolbox for data warehouse implementations since Ralph Kimball published “The Data Warehouse Toolkit” in the late 90’s. But with time, it came clear that not all business cases could be solved by the original SCD Types (1-3). New techniques have been formulated. Some based on combination of the existing ones, others are new. In 2013, the Kimball Group formulated a new set of SCD Types, now ranging from 0 to 7.

 

The focus for this article is to use these different SCD types as different loading scenarios. And use them as base for testing how SAS DI Studio can adopt these different techniques. In this article we focus on loading dimensions specifically, even if some of the SCD types can be adopted in other types of tables in a data warehouse. For as more comprehensive description of the different SCD types, see Margy Ross’ article.

 

We use surrogate keys in each example in our dimensional tables. It not always ideal to use them, but as soon as you have a composite business key, a surrogate key makes sense. If you would like to use hash keys instead, use my article Using SAS DI Studio to Load a Data Vault, Part II (DV2.0) as a starting point.

 

In this post I’ll setting up the stage with the different SCD types, and will exemplify the first two (types 0 and 1). DI Studio examples for the remaining SCD types will be covered in future posts.

 

The Different Slowly Changing Dimension Types

A short summary of the different types. They will be described a bit more under each section.

 

    0.   Retain Original. Once written, never change. Time dimensions attributes are typical Type 0. Other uses could be “Original Sales Channel”.

  1. Overwrite. If an attribute value for a dimension key changes, overwrite the row with latest (current) value(s).
  2. Keep history records. If there is a change for an attribute value, add a row with new/updated value(s). To keep track of validity of records you need start/end-date(time) columns. Other helper columns could be version number and current indicator. Perhaps the most common type, especially when it comes to the detail data layer of a data warehouse.
  3. Keep history columns. If an attribute value changes, move the current value to a column for outdated values, and store the current value in a “current” column. Not so popular because they tend to be complicated to update, query, and their static nature.
  4. Mini-dimension. Can be seen as a helper dimension which can complement larger dimensions with aggregated levels.
  5. Mini-dimension and Type 1 Outrigger. Types 1 + 4 = 5. This means that the mini-dimension key is stored in another dimension, as a SCD Type 1 attribute. If a customer have a changed score, the mini dimension key in the customer dimension will be overwritten with the new mini-dimension key.
  6. Type 1 and 3 Attributes in a Type 2 Dimension. Is based on Type 2, with Type 1 columns that are overwritten for all records. Plus type 3 columns that is historized as Type 2: 1 +2 +3 = 6. Let’s say you have a current product group column, which for all rows are constantly updated with the current value. And then a corresponding historized product grouping column.
  7. Dual Type 1 and Type 2 Dimensions. You split up the Type 6 into two separate tables, one Type 1 (current) and one Type 2 (historized) dimension. This means two dimension FK for a product dimension in the fact table.

 

SAS DI Studio and Slowly Changing Dimensions

In SAS DI Studio, there are two transformations specifically created for this purpose, the SCD Type 1 and 2 Loaders. Let us now see how we can use them, and other transformations to fulfill each SCD Type. Let me be clear that each scenario is not meant to be fully logical, or common in the real world, they are solely used to demonstrate DI Studio capability.

 

SCD Type 0

“Retain Original” – sounds easy. What I will use here is the Surrogate Key Generator transformation.

Type0_job.jpg

It uses as input along with the source table, the dimension table (which is unfortunately not shown in the job editor). Now we are only interested to fetch new records (since we will never overwrite). We can use a column in the target table (which should be defined as NOT NULL), which is not present in the source data. This to figure out if the records is coming from the source alone. And then a table loader that uses the Append technique – which is fine since we already have matched and filtered out existing rows in the previous steps.

 

SCD Type 1

Overwrite with the current values. Easy, especially when there is a ready-made transformation for this– the SCD Type 1 Loader.

Type1_job.jpg

This one is of course part of the DI Studio documentation, so I won’t go into detail here. Just want to mention that it generates a surrogate key for you, if you choose.

 

This was all that fitted into this first part. Watch the community for details on the other SCD Types in future posts. Until then, let me hear any comments from you about this first post!

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.