I have an APPEND process that duplicates data if I run it twice. Is there some way to avoid data duplication using APPEND?
For production ready code, you shouldn't be able to run the process twice with the same data.
If you can't gurantee that, redesign the job to either have an Update loading strategy, or as @Ksharp suggests - have a post process that removes duplicates.
This might be fine if your data is manageble, but it will affect the performance for sure.
To avoid a problem to duplicate data with an APPEND I created an INDEX on DI with some columns. When I run the job it is executed but send this message and JOB is finished with EXIT on FLOW MANAGER.
ERROR ON LOG:
Add/Update failed for data set <TABLE TARGET> because data value(s) do not comply with integrity constraint.
How could I solve that?
Thanks!
I wonder how you come to get duplicate data. The DI Studio Append Transformation (at least in my version 4.9) always deletes the output table before appending (check the transformation code in properties -> code), so it should be impossible to get duplicates unless your output table is input to a step before the append, so existing data goes into the appendtogether with new data.
It is a little bit difficult to imagine what's going on, so please post a screenshot of your job canvas.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.