The latest stable release of SAS Viya (2020.1.4) added the following to the SAS Information Catalog:
The overview of an information asset now includes information privacy, time period covered, and spatial area covered. These features require the SAS Information Governance license.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
This pop-up window contains a breakdown of the AUSCUST (Australian customers) into Sensitive, Private, and Candidate semantic classifications. You can see tokens representing Delivery Address, City, Postal Code, Latitude, and Longitude in the Candidate classification. If no private data is detected, none is displayed. The pop-up window also references the Quality Knowledge Base (QKB) locale that is used to classify the data, which we will discuss below.
The Time Period Covered field displays the date range, for example, January 1, 2015 – December 31, 2015. The time period value is based on the dates that are found in the data. When I saved the data, I added a WHERE clause on the year 2015.
In the data asset MELBRE (Melbourne real estate transactions), the Area Covered field pop-up window displays the top spatial values found from various fields, such as suburb and region. These correspond to the top frequency distribution values. The Quality Knowledge Base (QKB) locale used, Australia (English), is also displayed.
In a second example, the city Houston is displayed in the Area Covered field.
For the full description of these new attributes, see the Overview Tab in SAS Information Catalog: User’s Guide.
If you have licensed SAS Information Governance, discovery agents can analyze the names and content of columns in assets. You can now select the discovery locale whose country and language is appropriate for the assets that are discovered by the agent. The agent uses the discovery locale to perform identification analysis on the names and content of columns in assets. Select the discovery locale whose country and language is appropriate for the assets that are discovered by this agent. For example, if the assets contain names and addresses from United States in English, select the United States (English) locale.
If you know assets in your library are from China, you should select China (Chinese), if from Belgium, select Belgium (Dutch or French) and so on.
When the discovery is finished the Semantic Type indicates the most likely classification. These classifications are summarized on the Overview page.
At work behind the semantic type calculation sits the identification analysis (field content) or the field name analysis.
Identification analysis or field content, is more precise. It analyzes sample data and comes up with a list of candidates for the classification.
For example, a phone number can be classified as Phone with a score of 9 and Credit card with a score of 3. Why credit card? Well, for the software, a number is a number and if it starts with 4 and has 16 digits, it might look like a VISA card.
At the end, the classifications are ranked by score: Phone 9, Credit card 3, etc. The top one will be chosen as a Semantic Type: Phone. You have to understand that the classification has some degree of confidence built in, it is not perfect, there might be false positives, but it gives you a good idea without much effort.
Field name looks simply at the column name. If it matches some keywords, such as phone or mobile or GSM, it will be classified as a Phone. The field name doesn’t peek inside the column data, so if you have account numbers in a Phone column, the software will display Phone.
Identification analysis (field content) is more precise, but not available for all country / language pairs (called locales). For more information and to understand what is available for your country / language, see Understanding Content Analysis in SAS Information Catalog: Administrator’s Guide. For a very detailed list of Definitions by Locale see SAS Quality Knowledge Base for Contact Information 32.
We looked at the new features available in the SAS Information Catalog, latest stable SAS Viya release (2020.1.4): information privacy, time period and area covered, then explained the role of the Quality Knowledge Base and the influence of locale selection for discovery agents.
Thank you Mary Kathryn Queen, Vincent Rejany and Kumar Thangamuthu.
Thank you for your time reading this article. If you liked the post, give it a thumbs up. Please comment and tell us what you think about the new SAS Information Catalog.
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.