SAS Academy for Data Science

WWD · Posted 06-16-2019 12:01 PM

In the course notes for Module 1, Preparing Big Data for Analysis and Reporting, page 4-107, there exists a table that is used to explain the definition of “Match Codes”. In the second column from the left, there is a variable named “Match Code @85 Sensitivity”. All the values for this variable start out with “4B&~2”.

I could not find in the preceding documentation how the value for this variable is to be interpreted. I’m assuming that this is a regular expression for how the associated string is to be parsed.

May I get some help on how this string is constructed (user or system defined) and how this string is used, please?

Thank you.

Bill Donaldson

Cynthia_sas · Posted 06-17-2019 12:29 PM

Hi:
Here's some feedback from the course instructors:

=== === Longest feedback from an instructor === ===

To create a match code, the steps on the slide on that page (slide 45) are followed.

First it takes the string John Q Smith and parses it into tokens:

Given Name: John

Middle Name: Q

Family Name: Smith

Then if there are any noise words those are removed:

Transformations are made:

John > Jon

Phonetics are applied:

Then based on sensitivity, relevant components (tokens) are determined:

At 85% middle name is not significant, but given name and family name are

At this point we have <START>Smith Jon <END>

There are lots of spaces for the last name (in case it is long) and lots of spaces for the first name, and one space for the middle name (but it’s not significant at 85% so it is removed).

Then a keyboard transformation is applied:

SMITH = 4B&~2

<blanks> = $

JON = C@P

=== ===

=== === More feedback from second instructor === ===

The earlier chapters discuss construction of match code strings “lightly”. Mostly in this chapter we are showing how a match code string can help us in entity resolution within a single data file. We explore a bit more about the QKB in subsequent chapters 7-9. A match definition (which generates a match code) is discussed in more detail in Ch9 *but* it is necessary to understand Ch7 & Ch8 before jumping to Ch9.

=== ===

Hope this helps. It sounds like a review of Chapters 7-9 is what they recommend.

Cynthia

WWD · Posted 06-18-2019 05:02 AM

Thank you!

This helps me concentrate my studies.

Bill

SAS Academy for Data Science

Module 1, Preparing Big Data for Analysis and Reporting

Re: Module 1, Preparing Big Data for Analysis and Reporting

Re: Module 1, Preparing Big Data for Analysis and Reporting

Digital Literacy: Are You Prepared?

Team Learners - District-based Cost Analysis for Preparing Nutritional...

SAS Data Preparation 2.2: Profile – Content Analysis and Tagging

SAS Big data module 1 statistics question

Get Prepared - Introducing SAS Data Preparation

Follow Us

What is...

SAS Academy for Data Science

SAS Training: Just a Click Away

Follow Us

What is...