BookmarkSubscribeRSS Feed
beamer108
Quartz | Level 8

Hi

I've read the 'Help' documentation with how to use the clustering conditions but still am unclear on how to use it.

I have data that contains many variables such as key id, key list, file name, name, first name, last name, dob, address, phone number, email etc.  My new data coming in needs to be matched up and clustered to existing data if any of these conditions are met.  Key id would be the most ideal matching but unfortunately the data we receives comes from many sources which sometimes the Key id is only unique to that data source but not across all sources. For that reason we have to do matching on the other fields that come in.

Is there a hierarchy to how the matching is done in a cluster node - meaning if a match is found on the first condition does that mean the rest of the conditions are not considered?

What exactly does it mean for a 'cross match'?

This is what I have for my first clustering node:  

beamer108_1-1657566349333.png

 

I feel like my matches have become redundant - but because I don't truly understand how this cluster node works I add any match I can think of.

 

One other example within this that I have a question on is we have Key ID as a field but we also have Key List as a field in case a person has more then one Key id, all key ids would be added to the list.  So in one condition I have Key ID + First Name + File Name, then a cross match to Key ID+ Key List + First Name + File name, cross match Key ID + Last Name + File Name, cross match Key ID + Key List + Last Name + File Name. 

Can someone explain to me in simple terms what exactly I am matching on for that example?

 

beamer108_2-1657566547774.png

 

If there is documentation that better explains Clustering Node and how its used in more detail with examples other then what appears in the Help selection within Dataflux please let me know and I'll reference that to get more knowledge. Otherwise I appreciate some guidance on how I should be using this cluster node more effectively.

Thank you

4 REPLIES 4
VincentRejany
SAS Employee
Hi

Is there a hierarchy to how the matching is done in a cluster node - meaning if a match is found on the first condition does that mean the rest of the conditions are not considered?
--> No, all matching rules are evaluated

What exactly does it mean for a 'cross match'?
Typical scenario is if you have EMAIL1, EMAIL2 columns. With cross matching EMAIL1 values will be matched between them, EMAIL2 as well, but in addition, EMAIL1 will also be matched with EMAIL2

Regarding your scenario, I would suggest you add a source identifier (ie CRM -> 1, Call Center ->2, ERP -> 3), so you can match on Source System ID/Key.
A cross match on Key and Key list could make sense.
beamer108
Quartz | Level 8

Thank you 

 

We actually do have a source identifier labeled 'File Name' and we look for where the File Name and Key ID match that of a new record. Is that what you mean? 

VincentRejany
SAS Employee

Looking at your rules, I think you can keep only two (rule 1 and 3). Rules 2 and 4 are redundant, because records that match on R1 and R3 match also on R2 and R4

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 564 views
  • 0 likes
  • 2 in conversation