Hi
I hope you are still interested in this topic.
The QKB is used in generating Fuzzy Codes. It is a collection of file definitions, schemas, chop tables, phonetic libraries, regex Libraries, vocabularies, and grammars. These files can be edited in the Data Flux Management Studio Application. If out-of-the-box rules don't meet your organisation's needs then you can add or edit files. For me, I struggled with vocabulary and grammar files.
Match code generation follows a series of steps, using the rules from those files to tidy the string at each step to remove noise, standardize, normalize, phonetic reduction, and create a Matchcode Layout. It is a lot more than just the Soundex function.
QKB Definition Steps
The sensitivity defines the number of characters used to create a fuzzy code.
Sensitivity
Finally, MatchCode is an unencoded string, that is converted to is encoded Fuzzy String. The encoding logic is hidden but the node generates it based on characters. Encoding
For each value in the column data, these steps are followed to generate match codes. For a data value, Matchcodes do not change unless the QKB version or rules are edited. Hope this gives some understanding of the generation of Match Codes. If you are interested in learning and editing those files then there are QKB courses on SAS Learning, Wish you luck.
Rama
... View more