SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SCD 2 with multiple historical rows in one source table

Reply
Super Contributor
Posts: 408

SCD 2 with multiple historical rows in one source table

Hi all,

Our etl processes general ledger data from our asset management system into a dimensional datamart. It can happen that we must process mutliple months of data using the SCD2 transformation. They are conactenated into a single source table. As such the source table will contain multiple rows for the same key. The "valid from" date is in the data. This works quite well if I process one month worth of data. But if I have something like:

SecurityID, Date, MarketValue
ABCD, 01apr2011, 100
ABCD, 01may2011, 99

in the target dimension table the begin- and end dates get messed up. The first row is closed out with its own beginning date instead the next begindate - 1. When I allow multiple updates per day I get end-date=begin-date-1, which is even weirder.

My conclusion is that the current implementation of the scd2 transform does not support multiple occurences of the same business key in the source table but I find no evidence of that in the docs.

Or is it? Am I missing the magic option? Please help. Thanks in advance.

Regards, Jan.

Message was edited by: jklaverstijn Message was edited by: jklaverstijn
Respected Advisor
Posts: 3,887

Re: SCD 2 with multiple historical rows in one source table

Hi

I believe your conclusion is correct. I'm not aware of any option where one could define an additional sort variable for multiple business keys in one load. And if this is not possible then how would the SCD2 loader know in what order to add the new rows and which one should be the current one in the end.

If performance is not very critical then I believe in your case I would end up with a loop job.
The control table in the outer loop contains distinct dates from source (in ascending order), the inner job is then executed in serie loading source with a where condition ("date=&date_var") - and this "&date_var" is also used as value for valid_from_dttm.

I believe multiple updates per day means several loads with the same value for valid_from_dttm. That's not the case for you so you don't need this option.

If you can then use a retained key. It makes life so much easier when it comes to maintaining intersection tables for SCD2 primary tables.

HTH
Patrick
Regular Contributor
Posts: 182

Re: SCD 2 with multiple historical rows in one source table

Hi Jan,

I have run into similar problems with the SCD2 loader.
This was DI Studio 3.4 though, not sure what happened towards 4.2 / 4.3.

If I remember correctly we worked around the problem bij splitting the updates in separate batches, one per business key.

-Bart
Super Contributor
Posts: 408

Re: SCD 2 with multiple historical rows in one source table

Thanks Patrick and Bart, you seem to agree that it won't work the way I hoped. I will try and work out a way to split and iterate the tasks. I believe with many dimensions this will become tedious and maintenability will suffer.

Thanks for your input,
- Jan. Message was edited by: jklaverstijn
SAS Employee RMP
SAS Employee
Posts: 52

Re: SCD 2 with multiple historical rows in one source table

DI Studio 4.21

In SCD2 properties, under the options tab is a setting called "Allow multiple updates per day"

Maybe this is something to look at.

Richard
Ask a Question
Discussion stats
  • 4 replies
  • 888 views
  • 1 like
  • 4 in conversation