Solved: SAS DI Studio SCD Type 2 behaviour - duplicate records

Mitreto · Posted 02-16-2017 11:09 AM

Good afternoon all,

Please excuse me if this was already discussed, I wasn't able to find the same topic.

I'm new to the DI world and I'm working currently with SCD Type 2. May question is based on following:

the SAS doumentation says: If no differences in data are detected, then the source row is a duplicate of the target row. The source row is not loaded into the target as the new current row for that business key. The loading process moves on to the next source row. - http://support.sas.com/documentation/cdl/en/etlug/66819/HTML/default/viewer.htm#n03673ddo0y2twn114ep...

I'm running an exercise with 2 different tables to try and track the changes, between data using beginning and end dates and current indicator.

By pure chance I ran he same table twice, based on the theory posted above no records should be inserted [or so I thought] as no changes were made / it is the same table. I should mention that there are some 5 vars that I want to track and all the rest is Type 1...

I was surprised to see that the resulting table had duplicates. The beginning / end time and current indicator aparently worked.

There is propably something that I mis-read somewhere, but should the records be duplicated?

The process flow is below. The warning is because of a different lengths - an issue for me to investigate further.

Thank you

DavidGhan · Posted 03-01-2017 08:38 AM

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table.

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

View solution in original post

DavidGhan · Posted 02-16-2017 11:30 AM

I can't replicate what you describe when I try it. I get what the documentation, as you describe it, says. Perhaps if you could send a screencapture of these tabs in your transformation properties:

Mappings (with source columns displayed)

Detect Changes

Type 1 columns

Business Key

Including screencaptures for these other Tabs might also help:

Change Tracking

Generated Key

David

Mitreto · Posted 02-16-2017 12:11 PM

Hi David,

thank you for your reply,

please find the screens below:

Mappings:

Detect Changes:

Type 1:

BK:

Change Tracking:

GK:

The data:

Thank you

Marin

Patrick · Posted 02-16-2017 07:28 PM

What you describe should only happen if one of the columns under type 2 change tracking has a change in value for the same business key. I've used this transformation a lot; there can be some challenges with performance but I've never experienced that it got the load logic wrong.

Check in the data if your duplicates are really 100% duplicates.

Mitreto · Posted 02-17-2017 03:44 AM

Hi Patrick,

thank you for your reply.

They are 100% duplicates as I ran twice the same table.

There is only 1 thing - some of the dates are actually datetime and I'm not sure what will happen with the floating point. I'll try and remove the time leaving only the date available. All these are SCD Type 1 tab...

thank you

Marin

DavidGhan · Posted 02-17-2017 11:41 AM

The screencaptures you sent were a hard for me to read but I could not spot any issue there. I am able to replicate what you describe if the source table I use for the SCD Type 2 loader contains duplicate rows. The SCD Type 2 loader does not currently take into account duplicate rows in the source table and will add the first and then treat the second one as the new current record even if it is an exact duplicate. This is in a single execution of the transformation using a source table with duplicate rows. Check if you have duplicate records in your original source table that you use in your first load.

Mitreto · Posted 02-17-2017 12:23 PM

Hi David,

thank you for your reply. The original data contains only 10 records and as I run it twice it duplicates the rows.

Please find below the screens in larger size.

Thank you

Marin

DavidGhan · Posted 02-17-2017 04:50 PM

Those screenshots were visible. Thanks. I can't see why you get the results you do. The only way I can think of that it should happen is if the values for any of your SCD type 2 variables change between runs for each business key. Could you send me your data to me, and export your metadata to a package file and send me that package? To export the metadata to a package file, select all the metadata that is part of your job by highlighting them all and then selecting File -> Export -> SAS Package. Then follow the steps in the wizard. I could then reconstruct your job and see if I can figure out what is happeing. Also, what version of DI Studio are you using?

You could either send as an attachment here in the Forum or email as attachments to david.ghan@sas.com.

Patrick · Posted 02-18-2017 07:12 PM

@DavidGhan

If the OP can't share his data publicly but only with you then can you please share here what the issue was once resolved?

Mitreto · Posted 02-20-2017 09:15 AM

Hi all,

I'm sorry for the delayed reply, I was otherwise occupied.

I'll send the package to @DavidGhan and will make sure that and answer is posted once the issue is resolved.

Thank you

Marin

DavidGhan · Posted 03-01-2017 08:38 AM

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table.

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

Patrick · Posted 03-06-2017 04:19 PM

Makes sense. Blanks are relevant for md5() when calculating the digest value.

Mitreto · Posted 03-19-2017 10:36 AM

Hi all,

first, please accept my appologies for not getting back to you all and my delayed reply.

David, thank you for taking care of the case.

The only comment I have is that the target table was created as a copy of the source table. I'm not sure where the different lengths are coming from. If you look at the process flow no transformation should cause the issue as identified.

Thank you very much for your support.

Marin

ZUPA1 · Posted 01-30-2018 10:50 AM

Hi

I am having the same issue. However the LENGTH of all variables are the same on the SOURCE and TARGET. Is there anything else I need to do that can fix this issue?

Kind regards

Zubair

Patrick · Posted 01-31-2018 05:07 AM

@ZUPA1

Make sure the lengths are not only the same in SAS metadata but also in the underlying physical tables (=metadata is in sync with physical tables).

SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

SAS Innovate 2025: Save the Date