SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SCD type 1 & 3 in SAS code

Reply
Regular Contributor
Posts: 168

SCD type 1 & 3 in SAS code

Any possibilities to produce a SCD type 1 & SCD type 3 report in base\advanced SAS code? If so, I request you to provide me the outline.

Thanks!

Respected Advisor
Posts: 4,173

Re: SCD type 1 & 3 in SAS code

What do you mean by "report"? SCD is mainly a way of how to store history in a table - and the main work is how to load the data into such a table structure. If you've got already a SCD table (either 1 or 3) then it shouldn't be that hard to query data and create a report. Not sure though what a "SCD 3 report" would be.

In DI Studio there is a loader for SCD1 and one for hybrid SCD1/SCD2 loading of a table. Anything else (like SCD3) is not "out-of-the-box" but you can code whatever you like using SAS language.

Designing and implementing code for SCD loading techniques is not that simple so don't expect that someone serves you the solution "on a plate".

Happy though to give you some feedback if you post in detail what you want to achieve and also show us what work and own thinking you've already done.

Regular Contributor
Posts: 168

Re: SCD type 1 & 3 in SAS code

Thanks Patrick for your response.

I meant output table as an report. Yes, I've achieved the SCD's in DI but I wish to implement the same via coding. Because there may a situation in near future where I need to implement the SCD's via base\advanced SAS.

I understand it is difficult to achieve it in coding and hence I request you to provide me the outline which I already mentioned in my earlier post.

Thanks.

Respected Advisor
Posts: 4,173

Re: SCD type 1 & 3 in SAS code

If you have access to DIS then why don't you look into SCD2 Loader generated code? This will give you a very good starting point for a user written version.

The DIS generated code is something which needs to work for a multitude of environments and any volumes of source and target table. If you develop your own user written version for a specific application then it's very possible that you can come up with more efficient code. I've had to do this once for an implementation (where they had no DIS) and managed to get the load done with a single pass through the data (for SCD2).

If you look into the DIS generated code then you'll see that the main approach is to compare records matching over business key. You create digest values (hex32 representation of md5 values for the variables under change tracking) and you then compare these digest values:

- Matching business key and digest values the same: do nothing

- Matching business key and digest values different: For type1: update the record in target, For  type2: update the record in target with an end datetime of "new start datetime - 1 second", insert new source record

- No matching business key in target: Insert the new record from source.

And then there are things like "expire records in target but not in source" or maintenance of a generated key (surrogate or retained key) which will need some additional coding if you want it to implement.

Regular Contributor
Posts: 168

Re: SCD type 1 & 3 in SAS code

Thanks again for your response. It would be grateful if you can explain the term 'digest values' in brief.

Respected Advisor
Posts: 4,173

Re: SCD type 1 & 3 in SAS code

SAS(R) Data Integration Studio 4.8: User's Guide

Assuming you're actually having access to DIS then look into SCD2 loader generated code and search for "digest"

Regular Contributor
Posts: 180

Re: SCD type 1 & 3 in SAS code

You can also use the MD5 function to create a message digest (digital signature), which is nearly unique for each string that is passed to the function.

Using this function you can quickly check if one or more variables (from a list) have changed and generate or not an SCD type 2 or 3 record.

CTorres

Super User
Posts: 5,438

Re: SCD type 1 & 3 in SAS code

It seems that you still are confusing the concepts of report and table (data set), which is strange if you have experience with DI Studio.

Patrikcs advice on looking in the existing SCD Type Loader code is the best way to get started.

Do already have a business case for manual coding SCD?

I think the first priority is the convince the application owners that an ETL tool approach is much better than creating and manging hand written code.

Data never sleeps
Ask a Question
Discussion stats
  • 7 replies
  • 1090 views
  • 1 like
  • 4 in conversation