BookmarkSubscribeRSS Feed

SAS Data Integration Studio Capability Test: SCD Types 0 through 7, pt 1

Started ‎03-07-2015 by
Modified ‎10-05-2015 by
Views 2,809

SCDType0-7 slim.jpg

 

Introduction

Most people who are familiar with data warehouse concepts knows about the concept of Slowly Changing Dimensions. It’s been a part of the standard toolbox for data warehouse implementations since Ralph Kimball published “The Data Warehouse Toolkit” in the late 90’s. But with time, it came clear that not all business cases could be solved by the original SCD Types (1-3). New techniques have been formulated. Some based on combination of the existing ones, others are new. In 2013, the Kimball Group formulated a new set of SCD Types, now ranging from 0 to 7.

 

The focus for this article is to use these different SCD types as different loading scenarios. And use them as base for testing how SAS DI Studio can adopt these different techniques. In this article we focus on loading dimensions specifically, even if some of the SCD types can be adopted in other types of tables in a data warehouse. For as more comprehensive description of the different SCD types, see Margy Ross’ article.

 

We use surrogate keys in each example in our dimensional tables. It not always ideal to use them, but as soon as you have a composite business key, a surrogate key makes sense. If you would like to use hash keys instead, use my article Using SAS DI Studio to Load a Data Vault, Part II (DV2.0) as a starting point.

 

In this post I’ll setting up the stage with the different SCD types, and will exemplify the first two (types 0 and 1). DI Studio examples for the remaining SCD types will be covered in future posts.

 

The Different Slowly Changing Dimension Types

A short summary of the different types. They will be described a bit more under each section.

 

    0.   Retain Original. Once written, never change. Time dimensions attributes are typical Type 0. Other uses could be “Original Sales Channel”.

  1. Overwrite. If an attribute value for a dimension key changes, overwrite the row with latest (current) value(s).
  2. Keep history records. If there is a change for an attribute value, add a row with new/updated value(s). To keep track of validity of records you need start/end-date(time) columns. Other helper columns could be version number and current indicator. Perhaps the most common type, especially when it comes to the detail data layer of a data warehouse.
  3. Keep history columns. If an attribute value changes, move the current value to a column for outdated values, and store the current value in a “current” column. Not so popular because they tend to be complicated to update, query, and their static nature.
  4. Mini-dimension. Can be seen as a helper dimension which can complement larger dimensions with aggregated levels.
  5. Mini-dimension and Type 1 Outrigger. Types 1 + 4 = 5. This means that the mini-dimension key is stored in another dimension, as a SCD Type 1 attribute. If a customer have a changed score, the mini dimension key in the customer dimension will be overwritten with the new mini-dimension key.
  6. Type 1 and 3 Attributes in a Type 2 Dimension. Is based on Type 2, with Type 1 columns that are overwritten for all records. Plus type 3 columns that is historized as Type 2: 1 +2 +3 = 6. Let’s say you have a current product group column, which for all rows are constantly updated with the current value. And then a corresponding historized product grouping column.
  7. Dual Type 1 and Type 2 Dimensions. You split up the Type 6 into two separate tables, one Type 1 (current) and one Type 2 (historized) dimension. This means two dimension FK for a product dimension in the fact table.

 

SAS DI Studio and Slowly Changing Dimensions

In SAS DI Studio, there are two transformations specifically created for this purpose, the SCD Type 1 and 2 Loaders. Let us now see how we can use them, and other transformations to fulfill each SCD Type. Let me be clear that each scenario is not meant to be fully logical, or common in the real world, they are solely used to demonstrate DI Studio capability.

 

SCD Type 0

“Retain Original” – sounds easy. What I will use here is the Surrogate Key Generator transformation.

Type0_job.jpg

It uses as input along with the source table, the dimension table (which is unfortunately not shown in the job editor). Now we are only interested to fetch new records (since we will never overwrite). We can use a column in the target table (which should be defined as NOT NULL), which is not present in the source data. This to figure out if the records is coming from the source alone. And then a table loader that uses the Append technique – which is fine since we already have matched and filtered out existing rows in the previous steps.

 

SCD Type 1

Overwrite with the current values. Easy, especially when there is a ready-made transformation for this– the SCD Type 1 Loader.

Type1_job.jpg

This one is of course part of the DI Studio documentation, so I won’t go into detail here. Just want to mention that it generates a surrogate key for you, if you choose.

 

This was all that fitted into this first part. Watch the community for details on the other SCD Types in future posts. Until then, let me hear any comments from you about this first post!

Version history
Last update:
‎10-05-2015 03:42 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels