BookmarkSubscribeRSS Feed

Why Are Data Catalogue and Data Governance So Important? Q&A, Slides and On-Demand Recording

Started ‎05-26-2022 by
Modified ‎05-26-2022 by
Views 443

Watch this Ask the Expert session to learn the importance of data catalog and other data governance areas to improve delivery of analytic insights. 

 

Watch the webinar

 

You will learn:

  • Some of the common challenges hindering or decreasing the analytic pipeline delivery and business benefits.
  • How to get reliable, trusted data quicker to maximize benefits from the data for reporting or analysis.
  • How to integrate open source products.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Q&A 

Are there any options to integrate or share metadata with other applications? 

I did not go there, but thanks for asking the question. One thing I should mention is yes, we are working on that, and we currently have a way to do that as well. We have integrated with this open-source metadata standard called Egeria. Egeria is part of the Linux Foundation. It's an open metadata standard, more like a metadata marketplace, where different application vendors can share their metadata. It benefits the organization, so you do not have to go and stitch all these together, right? Right now, we are contributing our metadata from the SAS information governance product to Egeria. If the data stored in the organization has enabled, it automatically events all the assets that the user sees in the product to Egeria, which is open metadata standard and can be combined with other application metadata so that way they can see this lineage across different vendors. Obviously, we are also looking at integrating with open lineage and markets, which is other open metadata standard tools, to augment the lineage capability. We wanted to address the analytic use case with the discovery and lineage and be able to partner and integrate with other tools as well, whether it is open source and creating this innovation and build a partnership. The simple answer is yes, we can. This material is available to extract to the third-party and we are also able to take the third-party metadata into SAS using rest APIs and building that Azure pipeline. We build on top of that by integrating with the other vendors. 

 

What role does the cataloged data play in Lineage? 

The group, but I should also mention that we do not want to see lineage as a separate piece. That is one of the key things we are carrying out as part of this discovery. The admin is not just building the discovery Cadillac for users to identify datasets, they are also in the background. We are building the lineage meaning. As the data has been cataloged, we also capture the dataset usage. Whether the dataset used in the report or the model, everything is captured. The relationship between one dataset to another asset is being captured as part of that cataloging as well. What that means is when a user discovers the asset here, they could also go to lineage and discover the asset as well. Same exact asset, but more towards the left or right discovery of lineage. 

 

Where can I find this Catalog Home? I cannot see it under SAS Drive. Is it something that our organization needs to activate? 

If you look at the side menu bar, discover information assets. That is the SAS information governance and catalog the product. This is the discovery of information assets. 

 

Can SAS Information Catalog discover CAS Lib and Compute Lib? 

If you are existing SAS users, you are aware of the SAS library and CAS library where you could look at SAS datasets. It does both. In fact, all the data set catalog that SAS table and CAS table. This is, you know, a traditional SAS table with SAS7BDAT format. I can look at that individually or CAS table, which is the SASHDAT for the in-memory engine that can be cataloged as well. But both can be available together for the users. 

 

Does this platform run on prem and or in the cloud? 

This is native cloud deployment whether private or public cloud. If there is an on Prem cloud private cloud, you could deploy this as well. Can we deployed on multiple public clouds, although we have partnered with Azure for better integration. This is a cloud deployment, and it is multi cloud supported. 

 

Is information governance part of SAS Viya or should it be licensed additionally? 

That is important as you start looking at using this. The information catalog product packaging wise is any visual package or bundles that, as part of SAS Viya, will get the basic version of the catalog product. Basic version would have the search and be able to explore and see some of the profile metrics. That is any visual SAS Viya products along with the SAS Visual Analytics. SAS models would contain the basic version of catalog because regardless of whether they buy the reporting product or analytics product, they would want to discover it. We want to give the ability of the basics as information. But they can add additional capabilities, such as the semantic Identification or Information privacy. There are others, such as machine learning, business summary identification, and language detection and workload. Those are part of the sales information governance license. The basic is included with all the bundles, which is the sense information catalog. The license is the SAS information governance with advanced features. 

 

Can cloud data be accessed? 

Yeah, of course. Given this is a cloud-native deployment application, regardless of what cloud it is. You can easily call any cloud data on Azure, whether SQL Azure Synapse or Azure blob store. Same goes for AWS - AWS S3, Redshift or Yanmar Yanmar. Amazon's Hadoop version symbols of GCP for Google, as well. Any cloud data can be crawled and cataloged and be made available for discovery. 

 

What features are available to help with the consumption of these data assets? For example, is there a way to combine columns from different sources into a logical view for consumption by applications? 

The simple answer is yes and that's part of the action. When we were exploring the data assets in the action part, we have multiple actions. Prepare data is the action use can take while using SAS Information Governance to do data preparation. One data could be coming from Snowflake in the cloud and other data could be coming from a data warehouse sitting in the on premise. Regardless of where it is and if the connector is established, all kinds of transformation could be applied. Column can be removed, or column can be transformed. All of those preparations can be done at the prepared data layer and you can create this analytic based table, or one table more for preparation by the data engineer (or in some cases the analyst). 

 

Does information governance allow for metadata updates by various stakeholders in a decentralized manner or is it more of a centralized Data management method? 

If integration with other metadata sources, 3rd party vendors and/or sharing metadata was meant by decentralized then yes, it is. SAS Viya can share metadata with other players using integration with Egeria open metadata standard. 

 

Recommended Resources

SAS Information Governance page

Getting started with SAS Information Catalog in SAS Viya

SAS Information Governance Release Highlights

Please see additional resources in the attached slide deck.

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Version history
Last update:
‎05-26-2022 03:15 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Tags