BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sameer112217
Quartz | Level 8

 

CLUSTERID NAME           SURNAME

121              Amit                 Sharma
122               Nikhil                Devang
122                Nikhil                Devang

 

Suppose If I am running a SAS DI SCD2 type transformation and the business key is Cluster Id and if I am generating surrogate key which is clusterno, then in the final table I am getting the values in the Active_FLAg for Nikhil Devang only in the last record of that id 122 as Y which I defined in the current indicator as Y

 

24JUL2019:14:42:11.074530   24JUL2019:14:42:11.074530 122  Nikhil N 10 Devang
31DEC2099:00:00:00.000000  24JUL2019:14:42:12.074530 122 Nikhil Y 10 Devang

 

What if I want Y in both the rows. Also if there is cluster_id in the final table. And if I insert two new entry with cluster id 122, then I would want Y for both the newly inserted cluster. How is it possible?

 

I want the below results 

 

31DEC2099:00:00:00.000000  24JUL2019:14:42:12.074530 122  Nikhil Y 10 Devang
31DEC2099:00:00:00.000000  24JUL2019:14:42:12.074530 122 Nikhil Y 10 Devang

 

Only for last value in the active flag 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Ideally post DIS questions into the data management forum to get more people with the right skills looking into your question.

 

What you're asking for is not possible - and rightly so.

 

An SCD2 table may not have more than one current record per business key. Also overlapping start/end dates for the same business key are not allowed and would create a mess if it ever happened. That's not a SAS "thing" but how type 2 tables work.

 

A business key is a primary key and should be unique in your source table (per load). You should implement logic for "de-duping" prior to the load into the type 2 target table.

The SAS SCD2 transformation deals with duplicates the way you observe it - just by expiring duplicates "immediately" while loading - but this just increases the volumes in your type 2 table without adding value.  

 

....and I guess now it's also clear why there can only be a single change current indicator (active_flag) set to "Y" per business key.

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

Ideally post DIS questions into the data management forum to get more people with the right skills looking into your question.

 

What you're asking for is not possible - and rightly so.

 

An SCD2 table may not have more than one current record per business key. Also overlapping start/end dates for the same business key are not allowed and would create a mess if it ever happened. That's not a SAS "thing" but how type 2 tables work.

 

A business key is a primary key and should be unique in your source table (per load). You should implement logic for "de-duping" prior to the load into the type 2 target table.

The SAS SCD2 transformation deals with duplicates the way you observe it - just by expiring duplicates "immediately" while loading - but this just increases the volumes in your type 2 table without adding value.  

 

....and I guess now it's also clear why there can only be a single change current indicator (active_flag) set to "Y" per business key.

LinusH
Tourmaline | Level 20
Actually,I think there is a use case for not deduplicating the imput - if your data changes multiple times per day, you still might want to capture this even if your bacth runs daily.
If I recall right SCDType2 doesn't do MD5() comparison between input rows, so the responsiblity to feed only relevant records is on the developer - deduplicting on the record rather on the key.
Data never sleeps
Patrick
Opal | Level 21

@LinusH 

If you load multiple records with the same business key in a single batch then you don't really have control over the load order within the tuple and which record will in the end become the active one. I'm no more sure if the SCD2 generated code maintains the sort order of the source table - but even if it does it's certainly not documented nor guaranteed behaviour.

 

I believe that if you want to load multiple changes ("duplicates") in a single batch then you would need to implement the SCD2 loader within a loop transformation and feed one "duplicate" per iteration in the sequence of the changes (with the loop executing in sequence and not in parallel).

 

"If I recall right SCDType2 doesn't do MD5() comparison between input rows"

You recall right. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1245 views
  • 1 like
  • 3 in conversation