SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

"Duplicate values not allowed" on my pk_index in DI Studio

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

"Duplicate values not allowed" on my pk_index in DI Studio

I've created a job that's using a composite PK which includes eight variables. When I run the job, the final Table Loader generates an error message telling me  "Duplicate values not allowed on index pk_index for file MYTABLENAME." The Constraint- and Index Conditions of the Table Loader are both set to "After Load: On (table creation)". What is a good way to resolve this situation? I've never handled something like this before.

Thanks for your time.


Accepted Solutions
Solution
‎10-01-2012 05:07 AM
Super User
Posts: 5,427

Re: "Duplicate values not allowed" on my pk_index in DI Studio

Posted in reply to EinarRoed

Obviously, you have duplicate values in your source data.

How to handle this is a matter of process.

Perhaps you have the source data i some kind of staging table, which will make it easy to examine and determine the duplicates.

If duplicates are common, you could add some logic in your flows to find the duplicates prior to the loading step, hand have them stored in a separate table. The Data validation transformation could be used for this.

About the constraints, what is the requirement to handle data quality? Having constraints on a target like this is like a guarantee that the data is consistent. You pay with a little overhead during load. The trade off is when to restore or in other ways clean up the target table after a load of duplicate values.

Data never sleeps

View solution in original post


All Replies
Solution
‎10-01-2012 05:07 AM
Super User
Posts: 5,427

Re: "Duplicate values not allowed" on my pk_index in DI Studio

Posted in reply to EinarRoed

Obviously, you have duplicate values in your source data.

How to handle this is a matter of process.

Perhaps you have the source data i some kind of staging table, which will make it easy to examine and determine the duplicates.

If duplicates are common, you could add some logic in your flows to find the duplicates prior to the loading step, hand have them stored in a separate table. The Data validation transformation could be used for this.

About the constraints, what is the requirement to handle data quality? Having constraints on a target like this is like a guarantee that the data is consistent. You pay with a little overhead during load. The trade off is when to restore or in other ways clean up the target table after a load of duplicate values.

Data never sleeps
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 1026 views
  • 0 likes
  • 2 in conversation