SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SAS DI Studio SCD Type 2 behaviour - duplicate records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

SAS DI Studio SCD Type 2 behaviour - duplicate records

[ Edited ]

Good afternoon all,

Please excuse me if this was already discussed, I wasn't able to find the same topic.

 

I'm new to the DI world and I'm working currently with SCD Type 2. May question is based on following:

 

the SAS doumentation says: If no differences in data are detected, then the source row is a duplicate of the target row. The source row is not loaded into the target as the new current row for that business key. The loading process moves on to the next source row.  - http://support.sas.com/documentation/cdl/en/etlug/66819/HTML/default/viewer.htm#n03673ddo0y2twn114ep...

I'm running an exercise with 2 different tables to try and track the changes, between data using beginning and end dates and current indicator.

By pure chance I ran he same table twice, based on the theory posted above no records should be inserted [or so I thought] as no changes were made / it is the same table. I should mention that there are some 5 vars that I want to track and all the rest is Type 1...

I was surprised to see that the resulting table had duplicates. The beginning / end time and current indicator aparently worked.

 

There is propably something that I mis-read somewhere, but should the records be duplicated?

 

The process flow is below. The warning is because of a different lengths - an issue for me to investigate further.

 

SAS SCD2.jpg

 

Thank you


Accepted Solutions
Solution
‎03-19-2017 10:28 AM
SAS Employee
Posts: 21

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table. 

 

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

View solution in original post


All Replies
SAS Employee
Posts: 21

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

I can't replicate what you describe when I try it. I get what the documentation, as you describe it, says. Perhaps if you could send a screencapture of these tabs in your transformation properties:

 

Mappings (with source columns displayed)

Detect Changes

Type 1 columns

Business Key

 

Including screencaptures for these other Tabs might also help:

 

Change Tracking

Generated Key

 

David

 

Occasional Contributor
Posts: 8

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Hi David,

thank you for your reply,

please find the screens below:

Mappings:

Mappings.jpg

Detect Changes:

DetectChanges.jpg

Type 1:

Type1.jpg

BK:

BK.jpg

Change Tracking:
Change Track.jpg

GK:

GK.jpg

 

The data:

Data.jpg

 

Thank you

Marin

Respected Advisor
Posts: 3,892

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

What you describe should only happen if one of the columns under type 2 change tracking has a change in value for the same business key. I've used this transformation a lot; there can be some challenges with performance but I've never experienced that it got the load logic wrong.

 

Check in the data if your duplicates are really 100% duplicates.

Occasional Contributor
Posts: 8

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Hi Patrick,

thank you for your reply.

They are 100% duplicates as I ran twice the same table.

There is only 1 thing - some of the dates are actually datetime and I'm not sure what will happen with the floating point. I'll try and remove the time leaving only the date available. All these are SCD Type 1 tab...

 

thank you

Marin

SAS Employee
Posts: 21

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

The screencaptures you sent were a hard for me to read but I could not spot any issue there. I am able to replicate what you describe if the source table I use for the SCD Type 2 loader contains duplicate rows. The SCD Type 2 loader does not currently take into account duplicate rows in the source table and will add the first and then treat the second one as the new current record even if it is an exact duplicate. This is in a single execution of the transformation using a source table with duplicate rows. Check if you have duplicate records in your original source table that you use in your first load.

Occasional Contributor
Posts: 8

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Hi David,

thank you for your reply. The original data contains only 10 records and as I run it twice it duplicates the rows.

Please find below the screens in larger size.

Thank you

Marin

Change Track.jpg

BK.jpg

GK.jpg

DetectChanges.jpg

Type1.jpg

Mappings.jpg

 

Data.jpg

SAS Employee
Posts: 21

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Those screenshots were visible. Thanks. I can't see why you get the results you do. The only way I can think of that it should happen is if the values for any of your SCD type 2 variables change between runs for each business key. Could you send me your data to me, and export your metadata to a package file and send me that package? To export the metadata to a package file, select all the metadata that is part of your job by highlighting them all and then selecting File -> Export -> SAS Package. Then follow the steps in the wizard.  I could then reconstruct your job and see if I can figure out what is happeing. Also, what version of DI Studio are you using?

 You could either send as an attachment here in the Forum or email as attachments to david.ghan@sas.com.

 

Respected Advisor
Posts: 3,892

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

@DavidGhan

If the OP can't share his data publicly but only with you then can you please share here what the issue was once resolved?

Occasional Contributor
Posts: 8

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Hi all,

I'm sorry for the delayed reply, I was otherwise occupied.

I'll send the package to @DavidGhan and will make sure that and answer is posted once the issue is resolved.

 

Thank you

Marin

Solution
‎03-19-2017 10:28 AM
SAS Employee
Posts: 21

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table. 

 

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

Respected Advisor
Posts: 3,892

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Makes sense. Blanks are relevant for md5() when calculating the digest value.

Occasional Contributor
Posts: 8

Re: SAS DI Studio SCD Type 2 behaviour - duplicate records

Hi all,

first, please accept my appologies for not getting back to you all and my delayed reply.

David, thank you for taking care of the case.

 

The only comment I have is that the target table was created as a copy of the source table. I'm not sure where the different lengths are coming from. If you look at the process flow no transformation should cause the issue as identified.

 

Thank you very much for your support.

 

Marin

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 617 views
  • 4 likes
  • 3 in conversation