BookmarkSubscribeRSS Feed

SAS Viya: Implement SCD (Slowly Changing Dimensions) in SAS Studio Flow

Started ‎02-14-2023 by
Modified ‎02-14-2023 by
Views 1,400

With the November 2022 stable release (2022.11) there is now the capability to Implement SCD (Slowly Changing Dimensions) in a SAS Studio Flow.  This step requires a SAS Studio Engineer license.  

 

SAS Studio supports two types of slowly changing dimension (SCD) scenarios:

 

  • Type 1: no history of data changes. Specified columns are overwritten in the target table without retaining a history of changes. This is useful for maintaining columns that are not used for historical analysis.
  • Type 2: history of data changes is maintained. A new row is created in the target table each time a specified column is changed in the source table for a particular record. The most recent changed row is marked as current and the other rows for that record contain the history of its data changes.

 

The source table and target table for the SCD implementation must exist on the same database server.  Currently only the following databases are supported: Oracle, Singlestore, Snowflake, SQL Server, or Teradata.

   

Slowly Changing Dimension (SCD) Target Table Setup

 

The first thing you need to consider when implementing Slowly Changing Dimensions (SCD) is which column(s) from your source table you want to include in the target table and which type of SCD you want to track for each of those columns. On the target table, in addition to including the column(s) from the source table that you want to track you will need to include the primary key column(s) (unique identifier from source table) for SCD Type 1 and the following columns for SCD Type 2:

 

  • Primary key column(s) (unique identifier from source table)
  • Effective Start Date
  • Effective End Date
  • Current Flag
  • Surrogate Key (unique identifier for history records).

 

For my example, this is the design of my source table, Customer, for which I want to implement SCD.

 

1_SCD.png

 

Note that the primary key column is Customer_ID. I also have an Inactive column indicator.  It is best to mark a record in the source table as inactive rather than delete records when working with SCD.  

 

I want to implement SCD Type 1 for the columns Title, FirstName, MiddleName, Inactive, and ModifiedDate and SCD Type 2 for the columns LastName, EmailAddress, and Phone. I also want to include the Notes column on the target table as an extra column.  Therefore, this the design of my target table, Customer_History.

 

2_SCD.png

 

Note that I have included the necessary additional columns for SCD Types 1 and 2:

 

  • Primary key column(s) (unique identifier from source table): Customer_ID
  • Effective Start Date: Effective_Start
  • Effective End Date: Effective_End
  • Current Flag: Current_Flag
  • Surrogate Key (unique identifier for history records):

 

Also, two of my column names in the target are slightly different than in the source table – PhoneNumber maps to Phone and InactiveCustomer maps to Inactive.    

 

Create SAS Studio Flow for Slowly Changing Dimensions (SCD)

 

Now that I have all the needed information about my source and target tables and which types of SCD I want to perform on which columns, I am ready to build my SAS Studio Flow to implement SCD.  

 

First, I need to add my source table to the SAS Studio Flow.  In my case, it is a SQL Server table named Customer.  Reminder:  SCD is currently only supported on the following databases: Oracle, Singlestore, Snowflake, SQL Server, or Teradata.

 

3_SCD.png

 

Next, I add the Implement SCD step from the Integrate section to the flow and connect it to my source table.  Reminder: this step requires a SAS Studio Engineer license.

 

4_SCD.png

 

I select my Target Table information for SCD.  Reminder the target table must exist on the same database server as the source table.

 

5_SCD.png

 

Next, on the Options tab, I specify the SCD options I want to implement.  When selecting columns for these options, a column from the target table can only be selected once.  In other words, a target table column cannot be selected as both a business key and a surrogate key.  

 

The first required option is identifying the business key column(s) between the source and target.  In my case this is the CustomerID column on both tables.

 

6_SCD-1536x740.png

 

Note:  When I select the column from the target table the corresponding source column is automatically mapped since they have the same name on both tables.  

 

Next, I must specify column(s) for either SCD Type 1 or SCD Type2 or both.  In my case, I have decided to do both.  

 

First, I select the target columns Title, FirstName, MiddleName, InactiveCustomer, and ModifiedDate for SCD Type 1.  I need to manually map InactiveCustomer to Inactive on the source table since the column names are different.

 

7_SCD-1024x578.png

 

Next, I select the columns LastName, EmailAddress, and PhoneNumber for SCD Type 2.  I need to manually map PhoneNumber to Phone on the source table since the column names are different.

 

8_SCD-1024x575.png

 

Since I selected to perform SCD Type 2, I must also select fields for tracking the changes and a surrogate key.  For tracking the changes, I select my Effective_From, Effective_To and Current_Flag columns on my target table.

 

9_SCD-1024x667.png

 

For the required Surrogate key for SCD Type 2, I select the Customer_HistoryID column on my target table.

 

10_SCD-1024x512.png

 

The final Options selection is for Include other columns.  This an optional selection and something to note is that any column(s) selected here are only written to the target table if a new row is added.  In other words, the value(s) from these column(s) are only written to the target table on the initial run and when SCD Type 2 is triggered by an update to the source table.  In my case, select the CustomerNotes column on the target table and map it to the Notes column on the source table.

 

11_SCD-1024x514.png

 

I can view the Column Structure for the target table.

 

12_SCD-1024x695.png

 

 I review the Column Resolution tab. Columns with same names are matched together.

 

13_SCD.png

 

I select the Preview Data tab.  There aren’t any records to preview yet since my target table is empty.

 

14_SCD-1024x437.png

 

I save my SAS Studio Flow to Implement SCD for my Customer table and add a description to the node.

 

15_SCD.png

  

Initial Run of SCD SAS Studio Flow

 

Now I am ready for initial run of the Slowly Changing Dimensions (SCD) SAS Studio Flow for my Customer table.  Below is the current state of the records for my Customer table.

 

16_SCD-1536x245.png

 

I run my Implement SCD for Customer SAS Studio Flow.

 

17_SCD.png

 

I can view the submitted code for my SCD implementation.

 

18_SCD.png

 

My Customer_History table is a copy of the current state of my Customer table with the Effective_From, Effective_To, and Current_Flag columns completed.  This is the case since it is the initial run of SCD.

 

19_SCD-1536x148.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

SCD SAS Studio Flow Run After Source Table Update

 

The Customer table has been updated as follows since the last run of the flow.

 

20_SCD-1536x209.png

 

The record for CustomerID 1 triggers SCD Type 1 for the update of the FirstName and MiddleName columns and SCD Type 2 for the update the Phone column.  The record for CustomerID 4 triggers only SCD Type 1 for the update of the Inactive column.  

 

I re-run the Implement SCD for Customer SAS Studio Flow.

 

21_17_SCD (1).png

 

Now my Customer_History table is updated as shown below.

 

21b_SCD-1536x171.png

 

A new row is created for CustomerID 1 since it triggered an SCD Type 2 change.  Its SCD Type 1 changes are included in that row also and the previous row Current_Flag is changed to 0.  CustomerID 4 is simply updated with its new InactiveCustomer status since only SCD Type 1 was triggered for that row which also means its Notes from the source table were not updated in the Customer_History table since a new row was not created.    

 

Considerations for Reporting on the Tables

 

For both tables in most cases I will want to filter for where Inactive=0 to make sure I am reporting only on current customers.  (Note this column on my target table is named InactiveCustomer).  Also, for the Customer_History table I will want to group the records by the Customer_ID column to ensure I’m reporting on the full history of a customer and include the Current_Flag column, so I know which record is current in the grouping. To ensure I am working with only the current records, I want to filter for Current_Flag=1.    

 

Summary

 

The Implement SCD step is now available in SAS Studio Flow with the SAS Studio Engineer license for the following databases: Oracle, Singlestore, Snowflake, SQL Server, or Teradata.  For more information review the documentation for the step: SAS Help Center: Implement SCD: Storing and Managing Data over Time.

 

Acknowledgements

 

Special thanks to my colleagues @NicolasRobert and @UttamKumar for their assistance.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎02-14-2023 09:06 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags