BookmarkSubscribeRSS Feed

Leveraging SAS® Information Catalog REST APIs: Programmatically Discovering Data

Started ‎09-29-2023 by
Modified ‎11-03-2023 by
Views 1,685

This SAS Explore presentation delved into SAS Information Catalog REST APIs. It showcased how utilizing these REST APIs, accessible from SAS, Python or shell scripts, can unlock the potential of programmatically discovering data. Participants discovered how SAS Information Catalog REST APIs enable searching and identification of files, tables and other assets based on specific criteria. The session provided hands-on demonstrations illustrating how programmers and data engineers can leverage these APIs to incorporate files and tables into data management tasks, as well as trigger actions or automate workflows. These REST APIs can gather insightful metadata and provide a comprehensive view of your data landscape. Users can navigate and explore their data ecosystem effectively, providing a high-level overview of assets and enabling informed decision making. Attendees gained insight into the SAS Information Catalog REST APIs and witnessed several demonstrations. Attendees learned  about this powerful method to unlock the true potential of the SAS Information Catalog.

 

Presentation slides are attached to this post. See the attachment at the very top, under the title.

 

Recorded Presentation and Demonstrations

Watch the recorded presentation, including four short demonstrations:

  • SAS Information Catalog overview (00'45'').
  • Catalog Search API from SAS Studio demo (05'50''). The example uses SAS code.
  • Catalog Instances API from Visual Studio Code demo (10'22''). The example uses Python code.
  • Catalog Bots API from Postman (13'11'').

 

 

Why Use Catalog APIs?

No matter how comprehensive the interfaces in an information catalog might be, there will always be a need for custom reporting or custom workflows based on extremely specific asset characteristics.

 

SAS Information Catalog in Two Minutes

A SAS Information Catalog allows you to discover, to search and to manage your SAS  Viya assets.

  • You can have data, assets, tables, files or you can look at other types of assets such as SAS Visual Analytics reports, SAS models, SAS decisions, SAS Studio flows or code files.
  • You can search in the catalog for your assets.
    • You can use keywords. This is called free text search.
    • Or you can use calculated metrics and the metrics values. In the catalog jargon this is called faceted search. For example, you can search the assets by their asset type.
  • You can also choose to look at their information privacy status. This is something which is calculated by the catalog. For example, choosing only tables that contain private data and look at the status the review status given by a subject matter expert such as approved, under review or rejected.
  • The catalog produces tens of metrics about your data assets. Those metrics can be at the table level or they can be at the column level.
  • You can visualize the metrics in different ways and there are many views depending on the data type. Like with the Rubik's Cube, you can turn your assets around using the Information Catalog and you can look at them from different angles.
  • What drives the discovery behind? There is a set of jobs called agents which can be defined at the caslib, at the library level, or for an individual table or file.
  • Each agent is listing the contents of the library, applies any filters, and then starts calculating the metrics.
  • Those metrics are then sent to an index that you can use to search.

 

Use Cases and Catalog API Endpoints

Let's look at a few use cases and at the corresponding catalog API endpoints:

  1. First, we're going to talk about asset picking or asset search. The API endpoint is the /catalog/search.
  2. Then we're going to look at some application using asset insights. The API endpoint is /catalog/instances.
  3. And lastly, we're going to mention how you can manage your agents using the /catalog/bots endpoint.

 

Asset Search or Asset Picking

Imagine you have the following requirements:

  • You are responsible to provide a data self-service area to data scientists, data analysts or data engineers and in their analysis or their reporting they can only use tables of a certain type by preference, sourced from a specific location.
  • The tables must be reviewed by a subject matter expert before usage and the review status can be approved or review.
  • Also, the tables must be consumed only from a designated library that defines the self-service area.

How would you go about and fulfill those requirements? Well, you can write a program invoking the catalog search REST API.

 

Play the video above, around 05'50'', to see an example where we call the Catalog Search REST API from SAS Studio, using SAS code.

 

Asset Insights

 

Imagine now that you have a different set of requirements you have to use for your reporting. In your marketing reporting, you are using a certain set of tables. But there are some conditions:

  • Those tables must not contain private or sensitive data, must not contain content that is classified as being negative, so negative sentiment, and must not contain data which identifies an individual or an organization.
  • Moreover, tables that contain sensitive data should immediately trigger an isolation job.

You can fulfill those requirements by writing a program which is invoking the Catalog Instances REST API.

 

Play the video above, around 10'22'', to see an example where we call the Catalog Instances API from Visual Studio Code. The example uses Python code.

 

Agent Management

Let's say that we have a caslib or a SAS library that we would like to discover. You can use the /catalog/bots endpoint to:

  • Retrieve the existing agents
  • Create an agent on a new caslib or existing library.
  • Run the discovery agent by using another REST API.
  • Consult the state of the agent, the history.
  • Preview how many assets from the library will be discovered.
  • Update the details of a discovery agent or delete it.
  • Launch an ad-hoc analysis job.

Play the video above, around 13'11'', where we call the Catalog Bots API from Postman.

 

Conclusions

 

  • The Catalog REST APIs can be triggered from any programming or scripting language.
  • They are a great way of finding and selecting what you need fast from vast amounts of data.
  • Programmatically, they can help you derive additional insights about your data.
  • They can also be used to enforce your organization governance policies by triggering jobs when certain conditions are met.
  • And lastly, you can automate the asset management by creating and running agents.


The Catalog REST APIs will be public during the fall of 2023, and I can only hope that this session will encourage you to try the Catalog REST API, to explore your assets using SAS Information Catalog and to talk about the product in your organization.

Version history
Last update:
‎11-03-2023 12:40 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Explore 2023 presentations are now available! (Also indexed for search at lexjansen.com!)

View all available SAS Explore content by category: