With the November 2022 stable release (2022.11) there is now the capability to Implement SCD (Slowly Changing Dimensions) in a SAS Studio Flow. This step requires a SAS Studio Engineer license.
SAS Studio supports two types of slowly changing dimension (SCD) scenarios:
The source table and target table for the SCD implementation must exist on the same database server. Currently only the following databases are supported: Oracle, Singlestore, Snowflake, SQL Server, or Teradata.
The first thing you need to consider when implementing Slowly Changing Dimensions (SCD) is which column(s) from your source table you want to include in the target table and which type of SCD you want to track for each of those columns. On the target table, in addition to including the column(s) from the source table that you want to track you will need to include the primary key column(s) (unique identifier from source table) for SCD Type 1 and the following columns for SCD Type 2:
For my example, this is the design of my source table, Customer, for which I want to implement SCD.
Note that the primary key column is Customer_ID. I also have an Inactive column indicator. It is best to mark a record in the source table as inactive rather than delete records when working with SCD.
I want to implement SCD Type 1 for the columns Title, FirstName, MiddleName, Inactive, and ModifiedDate and SCD Type 2 for the columns LastName, EmailAddress, and Phone. I also want to include the Notes column on the target table as an extra column. Therefore, this the design of my target table, Customer_History.
Note that I have included the necessary additional columns for SCD Types 1 and 2:
Also, two of my column names in the target are slightly different than in the source table – PhoneNumber maps to Phone and InactiveCustomer maps to Inactive.
Now that I have all the needed information about my source and target tables and which types of SCD I want to perform on which columns, I am ready to build my SAS Studio Flow to implement SCD.
First, I need to add my source table to the SAS Studio Flow. In my case, it is a SQL Server table named Customer. Reminder: SCD is currently only supported on the following databases: Oracle, Singlestore, Snowflake, SQL Server, or Teradata.
Next, I add the Implement SCD step from the Integrate section to the flow and connect it to my source table. Reminder: this step requires a SAS Studio Engineer license.
I select my Target Table information for SCD. Reminder the target table must exist on the same database server as the source table.
Next, on the Options tab, I specify the SCD options I want to implement. When selecting columns for these options, a column from the target table can only be selected once. In other words, a target table column cannot be selected as both a business key and a surrogate key.
The first required option is identifying the business key column(s) between the source and target. In my case this is the CustomerID column on both tables.
Note: When I select the column from the target table the corresponding source column is automatically mapped since they have the same name on both tables.
Next, I must specify column(s) for either SCD Type 1 or SCD Type2 or both. In my case, I have decided to do both.
First, I select the target columns Title, FirstName, MiddleName, InactiveCustomer, and ModifiedDate for SCD Type 1. I need to manually map InactiveCustomer to Inactive on the source table since the column names are different.
Next, I select the columns LastName, EmailAddress, and PhoneNumber for SCD Type 2. I need to manually map PhoneNumber to Phone on the source table since the column names are different.
Since I selected to perform SCD Type 2, I must also select fields for tracking the changes and a surrogate key. For tracking the changes, I select my Effective_From, Effective_To and Current_Flag columns on my target table.
For the required Surrogate key for SCD Type 2, I select the Customer_HistoryID column on my target table.
The final Options selection is for Include other columns. This an optional selection and something to note is that any column(s) selected here are only written to the target table if a new row is added. In other words, the value(s) from these column(s) are only written to the target table on the initial run and when SCD Type 2 is triggered by an update to the source table. In my case, select the CustomerNotes column on the target table and map it to the Notes column on the source table.
I can view the Column Structure for the target table.
I review the Column Resolution tab. Columns with same names are matched together.
I select the Preview Data tab. There aren’t any records to preview yet since my target table is empty.
I save my SAS Studio Flow to Implement SCD for my Customer table and add a description to the node.
Now I am ready for initial run of the Slowly Changing Dimensions (SCD) SAS Studio Flow for my Customer table. Below is the current state of the records for my Customer table.
I run my Implement SCD for Customer SAS Studio Flow.
I can view the submitted code for my SCD implementation.
My Customer_History table is a copy of the current state of my Customer table with the Effective_From, Effective_To, and Current_Flag columns completed. This is the case since it is the initial run of SCD.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The Customer table has been updated as follows since the last run of the flow.
The record for CustomerID 1 triggers SCD Type 1 for the update of the FirstName and MiddleName columns and SCD Type 2 for the update the Phone column. The record for CustomerID 4 triggers only SCD Type 1 for the update of the Inactive column.
I re-run the Implement SCD for Customer SAS Studio Flow.
Now my Customer_History table is updated as shown below.
A new row is created for CustomerID 1 since it triggered an SCD Type 2 change. Its SCD Type 1 changes are included in that row also and the previous row Current_Flag is changed to 0. CustomerID 4 is simply updated with its new InactiveCustomer status since only SCD Type 1 was triggered for that row which also means its Notes from the source table were not updated in the Customer_History table since a new row was not created.
For both tables in most cases I will want to filter for where Inactive=0 to make sure I am reporting only on current customers. (Note this column on my target table is named InactiveCustomer). Also, for the Customer_History table I will want to group the records by the Customer_ID column to ensure I’m reporting on the full history of a customer and include the Current_Flag column, so I know which record is current in the grouping. To ensure I am working with only the current records, I want to filter for Current_Flag=1.
The Implement SCD step is now available in SAS Studio Flow with the SAS Studio Engineer license for the following databases: Oracle, Singlestore, Snowflake, SQL Server, or Teradata. For more information review the documentation for the step: SAS Help Center: Implement SCD: Storing and Managing Data over Time.
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.