Good afternoon all,
Please excuse me if this was already discussed, I wasn't able to find the same topic.
I'm new to the DI world and I'm working currently with SCD Type 2. May question is based on following:
the SAS doumentation says: If no differences in data are detected, then the source row is a duplicate of the target row. The source row is not loaded into the target as the new current row for that business key. The loading process moves on to the next source row. - http://support.sas.com/documentation/cdl/en/etlug/66819/HTML/default/viewer.htm#n03673ddo0y2twn114ep...
I'm running an exercise with 2 different tables to try and track the changes, between data using beginning and end dates and current indicator.
By pure chance I ran he same table twice, based on the theory posted above no records should be inserted [or so I thought] as no changes were made / it is the same table. I should mention that there are some 5 vars that I want to track and all the rest is Type 1...
I was surprised to see that the resulting table had duplicates. The beginning / end time and current indicator aparently worked.
There is propably something that I mis-read somewhere, but should the records be duplicated?
The process flow is below. The warning is because of a different lengths - an issue for me to investigate further.
Thank you
Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table.
Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.
I can't replicate what you describe when I try it. I get what the documentation, as you describe it, says. Perhaps if you could send a screencapture of these tabs in your transformation properties:
Mappings (with source columns displayed)
Detect Changes
Type 1 columns
Business Key
Including screencaptures for these other Tabs might also help:
Change Tracking
Generated Key
David
Hi David,
thank you for your reply,
please find the screens below:
Mappings:
Detect Changes:
Type 1:
BK:
Change Tracking:
GK:
The data:
Thank you
Marin
What you describe should only happen if one of the columns under type 2 change tracking has a change in value for the same business key. I've used this transformation a lot; there can be some challenges with performance but I've never experienced that it got the load logic wrong.
Check in the data if your duplicates are really 100% duplicates.
Hi Patrick,
thank you for your reply.
They are 100% duplicates as I ran twice the same table.
There is only 1 thing - some of the dates are actually datetime and I'm not sure what will happen with the floating point. I'll try and remove the time leaving only the date available. All these are SCD Type 1 tab...
thank you
Marin
The screencaptures you sent were a hard for me to read but I could not spot any issue there. I am able to replicate what you describe if the source table I use for the SCD Type 2 loader contains duplicate rows. The SCD Type 2 loader does not currently take into account duplicate rows in the source table and will add the first and then treat the second one as the new current record even if it is an exact duplicate. This is in a single execution of the transformation using a source table with duplicate rows. Check if you have duplicate records in your original source table that you use in your first load.
Hi David,
thank you for your reply. The original data contains only 10 records and as I run it twice it duplicates the rows.
Please find below the screens in larger size.
Thank you
Marin
Those screenshots were visible. Thanks. I can't see why you get the results you do. The only way I can think of that it should happen is if the values for any of your SCD type 2 variables change between runs for each business key. Could you send me your data to me, and export your metadata to a package file and send me that package? To export the metadata to a package file, select all the metadata that is part of your job by highlighting them all and then selecting File -> Export -> SAS Package. Then follow the steps in the wizard. I could then reconstruct your job and see if I can figure out what is happeing. Also, what version of DI Studio are you using?
You could either send as an attachment here in the Forum or email as attachments to david.ghan@sas.com.
If the OP can't share his data publicly but only with you then can you please share here what the issue was once resolved?
Hi all,
I'm sorry for the delayed reply, I was otherwise occupied.
I'll send the package to @DavidGhan and will make sure that and answer is posted once the issue is resolved.
Thank you
Marin
Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table.
Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.
Makes sense. Blanks are relevant for md5() when calculating the digest value.
Hi all,
first, please accept my appologies for not getting back to you all and my delayed reply.
David, thank you for taking care of the case.
The only comment I have is that the target table was created as a copy of the source table. I'm not sure where the different lengths are coming from. If you look at the process flow no transformation should cause the issue as identified.
Thank you very much for your support.
Marin
Hi
I am having the same issue. However the LENGTH of all variables are the same on the SOURCE and TARGET. Is there anything else I need to do that can fix this issue?
Kind regards
Zubair
Make sure the lengths are not only the same in SAS metadata but also in the underlying physical tables (=metadata is in sync with physical tables).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.