SAS Information Catalog – Bring governance for your data and AI assets together
Data is undoubtedly one of the most important assets for an organisation, and effective governance of this valuable asset helps organisations meet their regulatory obligations as well as innovate faster. Those who have implemented governance best practices for their data assets can trust their data and explain its journey through the entire decisioning process. Organisations are increasingly adopting Data Catalogs to support their governance-led initiatives and improve transparency in their business. A recent research report by Fortune Business Insights, projects the global data catalog market size to grow from $878 Million in 2023 to £3.4 Billion by 2030, at a CAGR of 21.5% during the forecast period.
Governance for data and AI
While governance of data assets has remained high on the agenda for several years, extending the governance remit to AI assets is now seen as equally important by data driven organisations. With openness, trust and governance as its core values, SAS Viya is a unique decisioning platform that brings together all aspects of analytics lifecycle – making it easy for different user personas to work with data at different stages and collaborate effectively. This ability to orchestrate the entire analytics lifecycle from one unified platform has enabled SAS to exclusively bring governance for all information assets under one roof through the SAS Information Catalog.
While data engineers, data scientists and business analysts can use SAS Information Catalog to search for relevant data assets for their analysis and reporting requirements, they can also use the catalog to find dataflows (data pipelines), models (both SAS and open source), rules, decisions and more. Business users can search for relevant reports while data stewards and IT administrators can detect duplicate assets for improved overall governance.
Federated governance for all data sources (#DataMeshUsers are you listening?)
SAS Viya provides transparent read/write access to data from any source or platform. This includes leading cloud data platforms, such as Azure Synapse, AWS Redshift, Google BigQuery, Snowflake, Databricks, and relational and nonrelational databases (Oracle, Teradata, MongoDB) – as well as support for open big data file formats (Apache ORC, Apache Parquet) etc. Regardless of the source or where data resides, Data Discovery Agents in SAS Information Catalog enable users to search through the entire data landscape from one unified catalog and find the relevant resource quickly. This can be particularly useful when implementing a Data Mesh methodology, where data product owners can rely on the Information Catalog as the one-stop shop for users from all business domains.
Finding the right resource for your need (Know your data and use it with confidence)
Powerful search capabilities of SAS Information Catalog enable users to narrow down their search for information assets to the most relevant list, so that they can quickly access and evaluate asset fitness, avoid rework, and make informed choices. The search results can be based on any combination of keywords, metadata facets or extended attributes such as status, asset owner or tags. Search results provide detailed information on each data asset – including data health, data quality, key metrics, etc., at both table and column levels. Data exploration views, which include correlation and outlier detection information, help users understand the data profile and column analysis to quickly determine if the data asset is fit for purpose before they begin any detailed analysis on it. Data Discovery Agents’ ability to analyse column semantic type is particularly useful for data audit and regulatory compliance as data stewards can use this facility to detect the existence of any personal data in a data asset. Lineage View is another key example of governance with SAS Information Catalog, which shows a business-level impact analysis of information assets (including open source models) in the context of their sources, their outputs, and the relationship between them.
Working with Glossary
Starting from February 2024, a Glossary component has been added to the SAS Information Catalog, which enables users to manage terms and associate them with information assets registered in the catalog. This provides a consistent source for term definitions that can be used by business and technical users alike. SAS customers who have used SAS Business Data Network in earlier software versions will quickly identify the potential this offers for improved overall governance.
Extending SAS Information Catalog with REST API
SAS Information Catalog REST API was published in the October 2023 release, and serves as the programming interface to the Catalog Service. The API calls can be triggered from any programming language and offer a great way to programmatically find and select what you need fast from vast amounts of data. This enables SAS metadata to be shared with any enterprise catalog and can also be used to enforce an organisation’s governance policies by triggering jobs when certain conditions are met. One of my personal favourite examples of using this REST API involves adding new asset types or new attributes to existing assets (e.g. data from usage logs) to meet bespoke business requirements. This opens a fascinating new set of possibilities in which SAS Information Catalog can make data driven organisations even more productive while maintaining high standards for information governance.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.