BookmarkSubscribeRSS Feed

7 Ways to Use the New SAS Information Catalog

Started ‎03-16-2021 by
Modified ‎08-29-2021 by
Views 10,506

1_bt_120_SAS_Information_Catalog_column_details_4-1024x497.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.



The SAS Information Catalog is finally here! The article gives you a preview of the SAS Viya version 2020.1.3 – February 17th, 2020 release. Read it to learn how to make good use of this brand-new SAS product. The SAS Information Catalog helps you uncover the needed data for your business purpose. Such a catalog gives you a place to ingest metadata from data sources. You can use the metadata to find relevant data for your business goals and understand the data sets you need.


There is no fixed playbook about how and who should use the SAS Information Catalog. In fact, everyone might find a piece which is useful in their line of work. Let's look at just:


7 ways you could use the SAS Information Catalog


  1. Find what you need
    1. Standard search
    2. Syntax search
  2. Assess the usability
  3. Understand the content
    1. Columns details
    2. Sample data
  4. Judge the data preparation effort
  5. Assess the data quality
  6. Identify private data
  7. Share your knowledge and act

Find what you need

SAS Information Catalog supports standard and syntax search. Both of these search methods return a list of results with the highest-scored results listed first. Let’s assume you are a data scientist and you need to forecast the household water consumption in a certain region. You need to find the most relevant data sets for your purpose: water meter data, consumption in cubic meters, meter location, etc. You don’t know the data sources and there is a huge amount of them. How do you find the needle in the haystack? The powerful magnet to attract the needle is the search.


2_bt_010_SAS_Information_Catalog_Search.png


Standard Search

Watch the standard search at work in the first minute of this video:



For example, you can search for parts of words without using wildcard characters. Instead of the word “water” you can enter wat and still see items for water included in the search results. Standard search also supports fuzzy logic, which means that closely related strings such as watr also match with terms like water. Search for water*


3_bt_020_SAS_Information_Catalog_Standard_Search_Keyword-1024x227.png



Search for watr


4_bt_040_SAS_Information_Catalog_Standard_Search_Keyword_Wrong_Spelling-1024x190.png



Several results appear. The tables are listed top-down by relevance. You would get a similar top three ranking if you would search for tables with water data . Standard search supports free text entry, which enables you to enter any word or phrase to form a query. Then you can use the query to search the table or column level. This approach enables you to use conversational language to describe the information asset that you need without using specialized phrasing or syntax. Elasticsearch is at work behind the scenes. If you are looking for a specific column, for example the water volume in cubic meters (m3), you can try a fuzzy search: Search for *m3*


5_bt_030_SAS_Information_Catalog_Standard_Search_Keyword_column-1024x174.png




Even more relevant results will show. Only tables with a column called Daily_W_C_M3 are displayed.


Syntax search

The purpose of syntax search is to provide a text interface to create a more specific query. The search operates at a table level. Watch the syntax search at work for one minute in this video:



Search for tables created between 12th and 14th of January. Try dateCreated: [2021-01-12 TO 2021-01-14]


6_bt_050_SAS_Information_Catalog_Syntax_Search_date-940x198.png



Search for tables having a keyword in the table label. Try label:"water"


7_bt_060_SAS_Information_Catalog_Syntax_Search_label-1024x147.png


The table containing "water" in the table label is shown.


8_bt_065_SAS_Information_Catalog_Syntax_Search_label-1024x265.png


Search for assets that contain the keyword “water” or “cluster” in the name. name:"water"^3 OR name:"cluster" "Water" is here boosted and will receive three times the score of "cluster".


9_bt_070_SAS_Information_Catalog_Syntax_Search_name-1024x249.png


You can refine the query even more: name:"water"^3 OR name:"cluster" AND type:casTable


10_bt_080_SAS_Information_Catalog_Syntax_Search_type-1024x195.png


You got the idea; the search is pretty powerful. You have several options to refine and return useful results. Syntax search is based on the Lucene Query Syntax (LQS). More details in SAS Information Catalog 2020.1.3 production documentation and Apache Lucene - Query Parser Syntax.


Elasticsearch

SAS Information Catalog uses the Elasticsearch engine. The default configuration for Elasticsearch provides a good experience for most users. Administrators might want to change some options. For general information about Elasticsearch configuration options, see SAS Viya deployment notes on Elasticsearch in Elasticsearch documentation. Elasticsearch has been used for a while inside SAS Visual Investigator. Now for the first time, it is part of a Data Management product.


Calculated Metrics

Watch the rest of this short video to understand how you can best use the calculated metrics:



Assess the usability

The search saved you time and narrowed the results. It is time to take a closer look and see what data you can use. Explore the results:


10_bt_080_SAS_Information_Catalog_Syntax_Search_type-1024x195.png


Open the selected search result and drill down into a screen that contains a table overview. The Overview tab contains summarized textual and graphical information derived from the item’s metadata:

  • How many rows/columns? What size the table has? 46,720/ 21 /9.6 MB.
  • Is this a fairly clean table? Completeness is at 95%.

The overview might contain (ideally) some collective knowledge, collected from other users:

  • A business description from a previous user.
  • A status telling you if the source is useful.

11_bt_090_SAS_Information_Catalog_Overview-1024x496.png


This knowledge can give you the extra confidence and you decide it is a good candidate for your task. Notes:

  • Features such as knowledge sharing, collaboration, tagging are not part of this release. They will be part of a future release (no dates or version communicated yet).
  • In the current version, you cannot see who and when added lastly the status and the business description.

Understand the content

In Column Analysis (Descriptive Measures), each column presents statistics about the content. In a few seconds you can assess fairly quickly the content and if it matches the intended use. Looking at the column metrics, the table contains:

  • Water meters for 50 properties (distinct values).
  • Daily water consumption in cubic meters (m3). This is metric you need to forecast. You already notice several outliers you will have to deal with.

12_bt_100_SAS_Information_Catalog_Column_analysis-1024x491.png


The consumption is for a range of dates in 2014 and 2015 (Year minimum and maximum).


12_bt_100_SAS_Information_Catalog_Column_analysis-1024x491.png


Column details

You can also drill down for more information about a selected column. A few examples: A numeric column:


14_bt_120_SAS_Information_Catalog_column_details_1-1024x518.png


A string:


15_bt_120_SAS_Information_Catalog_column_details_2-1024x503.png


A second string:


16_bt_120_SAS_Information_Catalog_column_details_3-1024x504.png


A column containing latitude and longitude:


17_bt_120_SAS_Information_Catalog_column_details_4-1024x497.png



Sample data

A sample data tab enables you to browse a few sample rows, the same as in SAS Data Explorer.


Judge the data preparation effort

The Column Analysis (Metadata Measures) helps you asses if:

  • You need data preparation or processing first. For example, when data has to be converted from string to numeric.
  • You have useless data. For example, the logical type highlights unary variables such as “City” (Houston). The column might be helpful in a report. When modelling data, the variable selection step might reject it.

18_bt_140_SAS_Information_Catalog_metadata_measures-1024x436.png


Assess the data quality

The Column Analysis (Data Quality Measures) can answer these questions:

  • Is my data complete or unique?
  • Any there any interesting patterns in the data?

19_bt_150_SAS_Information_Catalog_data_quality_measures-1024x448.png


Identify private data

The same tab Column Analysis (Data Quality Measures) can inform you of private data in the data set. Data identification is at work behind the scenes! The semantic type tells you what private or, potentially private data you have in your data set. In this example, address, postal code, city, and coordinates are assessed as information privacy candidates.


20_bt_160_SAS_Information_Catalog_data_classification-1024x459.png


In another table, these columns are assessed as information privacy private data. Depending on the local laws, organizations may not use personal data for a purpose other than the original intent without securing additional permission from the consumer. If you are forecasting water consumption, there might be no reason to process names and phones. Always best to check with your Data Protection Officer.


21_bt_170_SAS_Information_Catalog_information_privacy-1024x481.png


Share your knowledge and act

Ideally after you go and analyze the data, you might want to enrich the collective knowledge and share your data discovery with others:

  • Add a business description.
  • Adapt the status.

22_bt_180_SAS_Information_Catalog_Business_description-940x198.png


Adding to the business description helps this syntax search:


23_bt_190_SAS_Information_Catalog_Business_description_search-1024x245.png



Search for description:"cubic meters" OR description:"m3" and the results will return data sets with the keywords in the description. As mentioned, features such as knowledge sharing, collaboration, tagging are not part of this release. They will be part of a future release (no dates or version communicated yet). Finally, you can move to the next level and further explore and visualize, prepare, or manage the data, build models, or explore the lineage.


24_bt_200_SAS_Information_Catalog_action-1024x230.png


Collect metadata and discovery agents

To collect metadata, SAS Information Catalog needs to discover (or “crawl”) these assets. Want to know how to crawl your own caslibs and SAS Compute libraries? Or monitor these agents? Read more in How to Collect Metadata with the SAS Information Catalog.

What is New in the SAS Information Catalog SAS Viya 2020.1.4 Release

The latest stable release of SAS Viya (2020.1.4) added the following to the SAS Information Catalog: Information Privacy, Time Period, Area Covered for Assets, Locale selection for Discovery Agents. Read more about it here.


Conclusions

The brand-new SAS Information Catalog on SAS Viya 2020.1.3 comes with a powerful search engine to help you find the data assets you need. The catalog brings together a series of metrics calculated in different applications. The interface helps you assess the usability of the data, understand the content, drill-down into columns details and view sample data. It also reduces the time to take the decision if to use or not a certain table, as you can judge the data preparation effort, assess the data quality and identify private data.


Licensing

The product SAS Information Catalog will be offered in two variants, basic and advanced :

  • SAS Information Catalog: discover CASlibs, most of the calculated metrics.
  • SAS Information Governance: the above plus semantic types (private data identification and classification) and discover SAS Compute libraries.

Resources

Acknowledgements: Nancy Rausch, Vincent Rejany, Kumar Thangamuthu and Ashish Sharma.


Thank you for your time reading this article. If you liked the article, give it a thumbs up.

Please comment and tell us what you think about the new SAS Information Catalog.


Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎08-29-2021 09:13 PM
Updated by:
Contributors

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started