BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mitreto
Fluorite | Level 6

Good afternoon all,

Please excuse me if this was already discussed, I wasn't able to find the same topic.

 

I'm new to the DI world and I'm working currently with SCD Type 2. May question is based on following:

 

the SAS doumentation says: If no differences in data are detected, then the source row is a duplicate of the target row. The source row is not loaded into the target as the new current row for that business key. The loading process moves on to the next source row.  - http://support.sas.com/documentation/cdl/en/etlug/66819/HTML/default/viewer.htm#n03673ddo0y2twn114ep...

I'm running an exercise with 2 different tables to try and track the changes, between data using beginning and end dates and current indicator.

By pure chance I ran he same table twice, based on the theory posted above no records should be inserted [or so I thought] as no changes were made / it is the same table. I should mention that there are some 5 vars that I want to track and all the rest is Type 1...

I was surprised to see that the resulting table had duplicates. The beginning / end time and current indicator aparently worked.

 

There is propably something that I mis-read somewhere, but should the records be duplicated?

 

The process flow is below. The warning is because of a different lengths - an issue for me to investigate further.

 

SAS SCD2.jpg

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
DavidGhan
SAS Employee

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table. 

 

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

View solution in original post

15 REPLIES 15
DavidGhan
SAS Employee

I can't replicate what you describe when I try it. I get what the documentation, as you describe it, says. Perhaps if you could send a screencapture of these tabs in your transformation properties:

 

Mappings (with source columns displayed)

Detect Changes

Type 1 columns

Business Key

 

Including screencaptures for these other Tabs might also help:

 

Change Tracking

Generated Key

 

David

 

Mitreto
Fluorite | Level 6

Hi David,

thank you for your reply,

please find the screens below:

Mappings:

Mappings.jpg

Detect Changes:

DetectChanges.jpg

Type 1:

Type1.jpg

BK:

BK.jpg

Change Tracking:
Change Track.jpg

GK:

GK.jpg

 

The data:

Data.jpg

 

Thank you

Marin

Patrick
Opal | Level 21

What you describe should only happen if one of the columns under type 2 change tracking has a change in value for the same business key. I've used this transformation a lot; there can be some challenges with performance but I've never experienced that it got the load logic wrong.

 

Check in the data if your duplicates are really 100% duplicates.

Mitreto
Fluorite | Level 6

Hi Patrick,

thank you for your reply.

They are 100% duplicates as I ran twice the same table.

There is only 1 thing - some of the dates are actually datetime and I'm not sure what will happen with the floating point. I'll try and remove the time leaving only the date available. All these are SCD Type 1 tab...

 

thank you

Marin

DavidGhan
SAS Employee

The screencaptures you sent were a hard for me to read but I could not spot any issue there. I am able to replicate what you describe if the source table I use for the SCD Type 2 loader contains duplicate rows. The SCD Type 2 loader does not currently take into account duplicate rows in the source table and will add the first and then treat the second one as the new current record even if it is an exact duplicate. This is in a single execution of the transformation using a source table with duplicate rows. Check if you have duplicate records in your original source table that you use in your first load.

Mitreto
Fluorite | Level 6

Hi David,

thank you for your reply. The original data contains only 10 records and as I run it twice it duplicates the rows.

Please find below the screens in larger size.

Thank you

Marin

Change Track.jpg

BK.jpg

GK.jpg

DetectChanges.jpg

Type1.jpg

Mappings.jpg

 

Data.jpg

DavidGhan
SAS Employee

Those screenshots were visible. Thanks. I can't see why you get the results you do. The only way I can think of that it should happen is if the values for any of your SCD type 2 variables change between runs for each business key. Could you send me your data to me, and export your metadata to a package file and send me that package? To export the metadata to a package file, select all the metadata that is part of your job by highlighting them all and then selecting File -> Export -> SAS Package. Then follow the steps in the wizard.  I could then reconstruct your job and see if I can figure out what is happeing. Also, what version of DI Studio are you using?

 You could either send as an attachment here in the Forum or email as attachments to david.ghan@sas.com.

 

Patrick
Opal | Level 21

@DavidGhan

If the OP can't share his data publicly but only with you then can you please share here what the issue was once resolved?

Mitreto
Fluorite | Level 6

Hi all,

I'm sorry for the delayed reply, I was otherwise occupied.

I'll send the package to @DavidGhan and will make sure that and answer is posted once the issue is resolved.

 

Thank you

Marin

DavidGhan
SAS Employee

Marin sent me his job and table metadata and his source data. With this I was able to reconstruct his job and replicate the issue he found with duplicates rows when re-running the SCD Type 2 Loader transformation with the same source table. The reason this was happening was because the LENGTH attribute differed for type 2 columns in the source table versus the target table for the SCD Type 2 Loader transformation. When comparing the values for these type 2 columns in the source table versus the target, values were considered different even though the only difference was in the number of trailing blanks. Executing the job does generate warnings in the log regarding the difference in the LENGTH in the source table versus the target table. This issue was resolved by making sure that the LENGTH attribute of these type 2 columns was the same in both the source and target table. 

 

Note: when modifying the LENGTH attribute of a column should be taken you set the LENGTH to be less than originally. This may result in truncation of the value. If uncertain, it is better to set the LENGTH in both the source and target column to be the largest of the two.

Patrick
Opal | Level 21

Makes sense. Blanks are relevant for md5() when calculating the digest value.

Mitreto
Fluorite | Level 6

Hi all,

first, please accept my appologies for not getting back to you all and my delayed reply.

David, thank you for taking care of the case.

 

The only comment I have is that the target table was created as a copy of the source table. I'm not sure where the different lengths are coming from. If you look at the process flow no transformation should cause the issue as identified.

 

Thank you very much for your support.

 

Marin

ZUPA1
Calcite | Level 5

Hi

 

I am having the same issue. However the LENGTH of all variables are the same on the SOURCE and TARGET. Is there anything else I need to do that can fix this issue?

 

Kind regards

 

Zubair

Patrick
Opal | Level 21

@ZUPA1

Make sure the lengths are not only the same in SAS metadata but also in the underlying physical tables (=metadata is in sync with physical tables).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 5009 views
  • 4 likes
  • 4 in conversation