BookmarkSubscribeRSS Feed
🔒 This topic is locked. We are no longer accepting replies to this topic. Need further help? Please sign in and ask a new question.
WWD
Obsidian | Level 7 WWD
Obsidian | Level 7

In the course notes for Module 1, Preparing Big Data for Analysis and Reporting, page 4-107, there exists a table that is used to explain the definition of “Match Codes”.  In the second column from the left, there is a variable named “Match Code @85 Sensitivity”.  All the values for this variable start out with “4B&~2”. 

 

I could not find in the preceding documentation how the value for this variable is to be interpreted.  I’m assuming that this is a regular expression for how the associated string is to be parsed.

 

May I get some help on how this string is constructed (user or system defined) and how this string is used, please?

 

Thank you.

 

Bill Donaldson

2 REPLIES 2
Cynthia_sas
SAS Super FREQ

Hi:
  Here's some feedback from the course instructors:

 

=== === Longest feedback from an instructor === ===


To create a match code, the steps on the slide on that page (slide 45) are followed.

 

First it takes the string John Q Smith and parses it into tokens:

Given Name: John

Middle Name: Q

Family Name: Smith

 

Then if there are any noise words those are removed:

<there aren’t any for the string>

 

Transformations are made:

John > Jon

 

Phonetics are applied:

<not sure if any are applied in this case>

 

Then based on sensitivity, relevant components (tokens) are determined:

At 85%  middle name is not significant, but given name and family name are

At this point we have <START>Smith                      Jon                               <END>

There are lots of spaces for the last name (in case it is long) and lots of spaces for the first name, and one space for the middle name (but it’s not significant at 85% so it is removed).

 

Then a keyboard transformation is applied:

 

SMITH = 4B&~2

<blanks> = $

JON = C@P

=== ===

=== === More feedback from second instructor === ===

The earlier chapters discuss construction of match code strings “lightly”. Mostly in this chapter we are showing how a match code string can help us in entity resolution within a single data file. We explore a bit more about the QKB in subsequent chapters 7-9. A match definition (which generates a match code) is discussed in more detail in Ch9 *but* it is necessary to understand Ch7 & Ch8 before jumping to Ch9.

=== ===

 

Hope this helps. It sounds like a review of Chapters 7-9 is what they recommend.


Cynthia

WWD
Obsidian | Level 7 WWD
Obsidian | Level 7

Thank you!

 

This helps me concentrate my studies.

 

Bill

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 585 views
  • 0 likes
  • 2 in conversation