I am having a hard time understanding the following:
If a record in the left table does not have a matching record in the right table, it is marked as "deleted". If a record in the right table does not have a matching record in the left, it is marked as "added".
1. What does it mean by saying marked as "deleted" and marked as "added"? Does it mean adding a new field in a Cluster Diff report with either "delete" or "added" flags? If so, what the purpose of this is?
2. Looks like DataFlux doesn't let me specify a match definition. So, how DataFlux is going to define if there isn't a matching record in the right table to the record in the left table, and vice versa?
3. Diff type Combine, Divide, Same and Network are typically shown in a Cluster Diff report. Apart from implying whether or not the records are in the same cluster in the left and right tables under the same Diff Set. What else does it tell us?
Thanks!
The Cluster Diff node operates on two sets of matched ("clustered") records. Normally, this node is used to test changes to clusters from one run to the next. You may be testing matching rules and want to see how the different rules impact the membership of records in clusters, or you may use the same match rules from one run to the next, but your input set of records changed. Usually the change is additional records in the second run, as if an operational system in your organization is gaining more customers, orders, etc. over time. The key to all of this is the cluster identifiers that, when identical, indicate that per your matching criteria, the records are the same (or almost the same based on fuzzy matching). The second important part is that each record has a stable and unique row identifier.
Ron
The Cluster Diff node operates on two sets of matched ("clustered") records. Normally, this node is used to test changes to clusters from one run to the next. You may be testing matching rules and want to see how the different rules impact the membership of records in clusters, or you may use the same match rules from one run to the next, but your input set of records changed. Usually the change is additional records in the second run, as if an operational system in your organization is gaining more customers, orders, etc. over time. The key to all of this is the cluster identifiers that, when identical, indicate that per your matching criteria, the records are the same (or almost the same based on fuzzy matching). The second important part is that each record has a stable and unique row identifier.
Ron
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.