- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the course notes for Module 1, Preparing Big Data for Analysis and Reporting, page 4-107, there exists a table that is used to explain the definition of “Match Codes”. In the second column from the left, there is a variable named “Match Code @85 Sensitivity”. All the values for this variable start out with “4B&~2”.
I could not find in the preceding documentation how the value for this variable is to be interpreted. I’m assuming that this is a regular expression for how the associated string is to be parsed.
May I get some help on how this string is constructed (user or system defined) and how this string is used, please?
Thank you.
Bill Donaldson
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
Here's some feedback from the course instructors:
=== === Longest feedback from an instructor === ===
To create a match code, the steps on the slide on that page (slide 45) are followed.
First it takes the string John Q Smith and parses it into tokens:
Given Name: John
Middle Name: Q
Family Name: Smith
Then if there are any noise words those are removed:
<there aren’t any for the string>
Transformations are made:
John > Jon
Phonetics are applied:
<not sure if any are applied in this case>
Then based on sensitivity, relevant components (tokens) are determined:
At 85% middle name is not significant, but given name and family name are
At this point we have <START>Smith Jon <END>
There are lots of spaces for the last name (in case it is long) and lots of spaces for the first name, and one space for the middle name (but it’s not significant at 85% so it is removed).
Then a keyboard transformation is applied:
SMITH = 4B&~2
<blanks> = $
JON = C@P
=== ===
=== === More feedback from second instructor === ===
The earlier chapters discuss construction of match code strings “lightly”. Mostly in this chapter we are showing how a match code string can help us in entity resolution within a single data file. We explore a bit more about the QKB in subsequent chapters 7-9. A match definition (which generates a match code) is discussed in more detail in Ch9 *but* it is necessary to understand Ch7 & Ch8 before jumping to Ch9.
=== ===
Hope this helps. It sounds like a review of Chapters 7-9 is what they recommend.
Cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
This helps me concentrate my studies.
Bill