BookmarkSubscribeRSS Feed
margautz
Fluorite | Level 6

Dear All,

 

I have a question about DataFlux (Data Management Studio 2.4).

 

I am doing a search within two lists. I have to find which names are in both lists, with different "Sensitivity XX%" on the name to find out possible matching. 

 

I noticed that if I use the match code tool, it reads the fields from the left to the right and creates the match code.

E.g.:

 

1- BANQUE POPULAIRE BOURGOGNE FRANCHE COMTE

2- BANQUE POPULAIRE ALSACE LORRAINE CHAMPAGNE

3- BANQUE POPULAIRE LORRAINE CHAMPAGNE

At 65% the lines 1 and 2 are equal, but the 3 not.

 

If I change the order of the words

1- BANQUE POPULAIRE BOURGOGNE FRANCHE COMTE

2- BANQUE POPULAIRE LORRAINE CHAMPAGNE ALSACE 

3- BANQUE POPULAIRE LORRAINE CHAMPAGNE

At 65% the lines 2 and 3 are equal, but the 1 not.

 

Is it possible to create a match code (or a workaround) that does not take in account the order of the words lowering the sensitivy of the match? I mean, I do not expect that BANQUE POPULAIRE equals POPULAIRE BANQUE with a sensitivy=100%, but maybe with a lower sensitivity...yes.

 

Thank you for your help and for your time.

 

Best regards,

Margherita

3 REPLIES 3
RonAgresta
SAS Employee

Hi,

 

You can use the Customize component in DM Studio to see exactly what is happening at each step of the match code generation. Access it through the "Administration" riser in DM Studio, then expand "Quality Knowledge Bases," open the locale you are using, find the match definition and open it, and finally add some sample values in the lower left corner of the application and step through the definition actions to see where changes are being made. You can use Customize to modify the behavior of the match code generation but make sure you make a copy of the definition (or the QKB first).

 

Ron

 

margautz
Fluorite | Level 6

Hello Ron,

 

thank you for your answer, indeed to write the question, I used the Customazie component to undestand the behaviour.

 

I know that I can change the code for the match code generation, but I would prefer to avoid it.

I thought that I can split/parse the names and do a match on each part, but I do not know upfront how many fields/words I need. Or maybe I can exclude some words (recurrent ones) from the match code generation, but I do not know “how”. Or…I maybe other workarounds.

 

I noticed that if I lower the sensitivity to 50%, there will be the match, but it is too much. I tried it and with my two lists it will match BANQUE POPULAIRE ALSACE LORRAINE CHAMPAGNE with 30 different cases that start with BANQUE POPULAIRE.

 

Thank you anyway for your advice.

 

Best regards,

Margherita

 

RonAgresta
SAS Employee

Some take the approach of parsing on white space, removing "noise" words," alphabetizing the remaining words, generating match codes for those, and then clustering.


Ron

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1526 views
  • 0 likes
  • 2 in conversation