SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Correct mispelling of names in Data Management Studio 2.6

Reply
Occasional Contributor
Posts: 14

Correct mispelling of names in Data Management Studio 2.6

In DM Studio there is a description of the Standardization node that says "Makes similar items the same. Examples of standardization are corrected misspellings (Mary instead of Mmary)..." I've tried using this Mmary example, along with other misspellings, to get the corrected name. However, no definition or scheme in the QKB that I've seen corrects these misspellings.  What am I doing wrong? I'm using the CI 24 QKB.

SAS Super FREQ
Posts: 97

Re: Correct mispelling of names in Data Management Studio 2.6

Hi,

I believe the documentation is just showing an example to help describe the concept of standardization, rather than giving a specific instance of a standardized value provided by a standardization definition from the Quality Knowledge Base. It is possible to build in these kinds of transformations of course using the Customize component and its associated editors in Data Management Studio. You can also use Customize to see how data is transformed step by step for each data quality algorithm. This would let you see where you could add a standardization scheme to existing standardization definitions to correct common name misspellings (though standardizing people's names can be tricky especially when diminutive names and nicknames are involved).


Ron

Occasional Contributor
Posts: 14

Re: Correct mispelling of names in Data Management Studio 2.6

Thank you Ron. I was using the Customize component to see how the data was being transformed, and I saw where one of the nodes transformed a misspelled name properly ( perhaps this was the Vocabulary Editor), but wasn't sure how to incorporate this into a job where it could be used to standardize names. Guess I have some more reading to do. Thanks again.

SAS Employee
Posts: 75

Re: Correct mispelling of names in Data Management Studio 2.6

This topic from the user guide might help: Using QKB Definitions in Jobs, Profiles, and Explorations

SAS Employee
Posts: 75

Re: Correct mispelling of names in Data Management Studio 2.6

Posted in reply to DaveR_SAS

Also, the default repository (DataFlux Sample) has a number of data jobs that use a QKB to perform data quality operations. See this topic for an intro to data jobs:

Overview of Data Jobs

See this topic for a description of the data quality nodes: Data Job Nodes

Ask a Question
Discussion stats
  • 4 replies
  • 490 views
  • 6 likes
  • 3 in conversation