Hi
I've read the 'Help' documentation with how to use the clustering conditions but still am unclear on how to use it.
I have data that contains many variables such as key id, key list, file name, name, first name, last name, dob, address, phone number, email etc. My new data coming in needs to be matched up and clustered to existing data if any of these conditions are met. Key id would be the most ideal matching but unfortunately the data we receives comes from many sources which sometimes the Key id is only unique to that data source but not across all sources. For that reason we have to do matching on the other fields that come in.
Is there a hierarchy to how the matching is done in a cluster node - meaning if a match is found on the first condition does that mean the rest of the conditions are not considered?
What exactly does it mean for a 'cross match'?
This is what I have for my first clustering node:
I feel like my matches have become redundant - but because I don't truly understand how this cluster node works I add any match I can think of.
One other example within this that I have a question on is we have Key ID as a field but we also have Key List as a field in case a person has more then one Key id, all key ids would be added to the list. So in one condition I have Key ID + First Name + File Name, then a cross match to Key ID+ Key List + First Name + File name, cross match Key ID + Last Name + File Name, cross match Key ID + Key List + Last Name + File Name.
Can someone explain to me in simple terms what exactly I am matching on for that example?
If there is documentation that better explains Clustering Node and how its used in more detail with examples other then what appears in the Help selection within Dataflux please let me know and I'll reference that to get more knowledge. Otherwise I appreciate some guidance on how I should be using this cluster node more effectively.
Thank you
Thank you
We actually do have a source identifier labeled 'File Name' and we look for where the File Name and Key ID match that of a new record. Is that what you mean?
Looking at your rules, I think you can keep only two (rule 1 and 3). Rules 2 and 4 are redundant, because records that match on R1 and R3 match also on R2 and R4
Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.