We’re smarter together. Learn from this collection of community knowledge and add your expertise.

What is a QKB?

by SAS Employee MKQueen on ‎10-03-2014 02:58 PM - edited on ‎10-06-2015 08:09 PM by Community Manager (1,084 Views)

And the answer is not the airport code for Breckenridge, Colorado! 




A QKB in the context of SAS® products refers to the SAS® Quality Knowledge Base (QKB).  The QKB is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. SAS software products reference the QKB when performing data management operations on your data.  These products include: SAS® Data Integration Studio, SAS® DataFlux® Data Management Studio/Server, SAS® code via dqprocs, SAS® MDM, and SAS® Data Quality Accelerators.




You can customize the definitions in the QKB using SAS® DataFlux® Data Management Studio.  This allows you to update the out-of-the-box QKB definitions or create your own data types and definitions to suit your project needs.  For example, you may need to create a definition to extract drug name and dosage information from a free-form text field for your pharmaceutical data.  If you are interested in learning more about customizing the QKB, you can attend one of the public training sessions on the topic or contact your SAS rep to schedule on an on-site offering for your team.




Earlier this year there were two new releases for the QKB for Contact Information (CI) – QKB CI 22 and 23.  The QKB for CI contains data quality definitions for Contact Information data types such as: Name, Organization, Address, City, Postal Code, Phone, Email, Country, and URL.  It also contains specialty definitions such as: Space Removal, Punctuation Removal, and Date/Time.  Currently the QKB for CI supports over 35 locales.  A locale in the context of the QKB is a language-country combination. This is important because how you would parse an Address or standardize a Name could vary by language and country.   You can license one or more locales depending on your data cleansing needs. 




One of the updated definitions in the QKB CI 23 is the Address (Global) Parse definition.  This is a definition at the Global-level meaning it is available to all locales regardless of language and country.  The definition takes a full street address string (e.g., Name, Street Address, and Building Name as one string) and parses it into its individual components.  Here is an example of this definition in action:



Later this year a new version of the QKB for CI as well as a new standalone offering of the QKB for Product Data (PD) will be released.  I will talk about these upcoming QKB releases in a future blog posting.  For more information on QKBs in general and the QKB CI 22 and 23 releases specifically, please refer to the product documentation.

Your turn
Sign In!

Want to write an article? Sign in with your profile.

Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.