The SAS Information Catalog is finally here! Read the article to understand how metadata is collected with a discovery agent (crawler) and added to the catalog. Learn what content can be crawled from a caslib and a SAS compute library and how to monitor the agents. The post gives you a preview of the SAS Viya version 2020.1.3 – February 17th, 2021 release.
In 7 Ways to Use the New SAS Information Catalog you might have read how catalog users can search the catalog, asses and understand the data assets. The post will focus on the metadata collection process.
The collected metadata is stored in the information catalog. You add information to the catalog by running discovery agents on libraries. These agents "crawl" through caslibs or SAS compute libraries content. Agents are also known as bots or crawlers. They collect the metadata from physical tables or files inside the library and calculate many metrics in the process.
SAS Information Catalog discovery agents can ingest metadata from global CAS libraries (caslib) or SAS compute libraries. As a consequence, data sources covered by a Data Connector, Data Connect Accelerator or SAS Access engine become discoverable. (SAS/Access for Hadoop needs some extra path options.) Let’s see two examples:
Findings:
Examples:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
To discover new assets with the SAS Information Catalog:
You can create global caslibs:
In SAS Information Catalog, create a discovery agent. Choose or search the caslib to be discovered:
When the job status is Idle, the job has completed.
You can trace the execution in SAS Environment Manager, the Jobs and Flows sections:
Each discovery agent is made of jobs. The jobs run chronologically in this order:
The jobs has a series of parameters. You cannot adjust them in the interface (yet):
The metrics produced by the agent can be seen in the 7 Ways to Use the New SAS Information Catalog.
The other library type you can crawl is a SAS Compute library. You can create compute libraries in SAS Studio. See Working with Libraries in SAS Studio: User’s Guide. In the following example, you might define a BASE (V9) SAS library. Tips:
The crawl process is similar to caslibs. Only this time, the whole process runs in run in SAS Compute:
When the third step, Analyze <discovery agent name> connection job kicks-in, the metrics are collected mostly with:
Watch the discovery agent at work in this 1' video:
While the product is called SAS Information Catalog, there are two licenses:
Please note you need a SAS Information Governance license to:
See the following resource for the complete set of features.
The latest stable release of SAS Viya (2020.1.4) added the following to the SAS Information Catalog: Locale selection for Discovery Agents, Information Privacy, Time Period, Area Covered for Assets. Read more about it here.
In the brand-new Information Catalog on SAS Viya version 2020.1.3, you can run discovery agents on libraries (caslibs or SAS compute libraries). These agents crawl through the libraries. They collect the metadata from the tables and calculate many metrics in the process. The agents run as jobs that can be monitored in SAS Environment Manager. They bring the metadata within the SAS Information Catalog. Then you can use the powerful search engine to help you find the data assets you need. See the videos in 7 Ways to Use the New SAS Information Catalog.
Acknowledgements: Nancy Rausch, Kumar Thangamuthu and Vincent Rejany.
Thank you for your time reading this post. If you liked the post, give it a thumbs up. Please comment and tell us what you think about the new SAS Information Catalog.
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.