BookmarkSubscribeRSS Feed

Discover Your Data with SAS Information Catalog APIs from Python – Upload to CAS

Started 3 weeks ago by
Modified a week ago by
Views 470

Navigating through a sea of data assets can be a daunting task. SAS Information Catalog is your navigator in this journey, allowing you to discover, search, and manage your SAS Viya assets efficiently. When a data asset is discovered, hundreds of metrics are calculated. Imagine having the ability to upload these metrics to a CAS table, using a Catalog API. This opens up the possibility of using these rich metrics in custom reports or flows.

 

Metadata Levels

 

When utilizing the SAS Information Catalog REST API to upload metadata and metrics for data assets, tables, or files, you have several options to choose from. The metadata can be of various types, such as:

 

  • dataDictionary: provides basic column-level metadata, such as name and type.
  • dataDictionaryAndProfile: offers extensive metadata that combines both data dictionary and profile metrics.
  • detailedMetrics: gives comprehensive metadata at a column level, including patterns, frequency distribution values.

 

The middle level, dataDictionaryAndProfile, strikes an excellent balance between richness and complexity, making it the ideal choice if you need to identify private data, semantic type or classification at a column level.

 

Upload Metadata

 

Get a SAS Viya Access Token

 

For information on obtaining a SAS Viya access token, refer to the previous post Discover Your Data with SAS Information Catalog APIs from Python – Access.

 

Python Program

 

The Python program upload_metadata.py retrieves metrics and metadata from SAS Information Catalog. It then uploads the metrics and metadata to a CAS table, using the Catalog REST API. The program:

 

  • Imports necessary packages.
  • Constructs the Catalog API URL for the request.
  • Retrieves a saved access token.
  • Sends the Catalog API request:
    • Headers 'Content-Type' include the keyword 'instance.upload'.
    • Body contains the caslib name and a CAS table prefix.

 

Here’s the complete code:

 

# 1 Packages
import sys
import requests
import json
import os

# 2. Arguments
print ("Number of arguments:", len(sys.argv), "arguments")
print ("Argument List:", str(sys.argv) + '\n')
baseURL=str(sys.argv[1])
search_query=str(sys.argv[2])
pem_path=str(sys.argv[3])

# 3. Construct Variables
print ("REST API Inputs:\n")
print('\nYour SAS VIYA host is ', baseURL)
url = f'{baseURL}/catalog/instances' + search_query
print('\nCatalog API URL: ', url)


# 4. Get the Saved Access Token
print('\nRetrieving the saved token from api/access_token.txt\n')


with open("api/access_token.txt", "r", encoding="UTF-8") as f:
    token = f.read()
#print(token)

print('\nUpload metadata specified in the URL\n')

"""
# Select one of the following - passed as a parameter

# Upload profile Metrics filtered by name with prefix
url = f"{baseURL}/catalog/instances/?filter=startsWith(name,'WATER')&?filter=contains(type,cas)&level=dataDictionaryAndProfile&limit=10"

# Upload detailedMetrics filtered by name with prefix
# url = f"{baseURL}/catalog/instances/?filter=contains(type,cas)&level=detailedMetrics&limit=10"

# Upload profile Metrics filtered by name with prefix
# url = f"{baseURL}/catalog/instances/?filter=contains(type,cas)&level=dataDictionary&prefix=simpleUpload&limit=100"
"""

headers = {  'Authorization': 'Bearer ' + token,'Content-Type': 'application/vnd.sas.metadata.instance.upload.request+json', 'Accept': 'application/vnd.sas.metadata.instance.upload.request+json'}

# Replace placeholders in the body with actual values
data = '''{
    "level": "dataDictionaryAndProfile",
    "prefix": "Catalog",
    "dateTimeStampSuffix": false,
    "serverName": "cas-shared-default",
    "caslibName": "Public"
    }'''

response = requests.post(url, headers=headers, data=data, verify=pem_path)
print('Response code: ', response.status_code)

print('\nThe Catalog Metadata was uploaded for you in CAS in Public.Catalog_DMDictionaryPlusMetrics \n')

print(response.text)
 

 

This program demonstrates how to use the SAS Viya Catalog REST API to perform a metadata upload.

 

Run the Program

 

The program expects the following command-line arguments:

 

  • The hostname of the SAS Viya server.
  • The metadata level and additional filters.
  • The path to the PEM file for TLS certificate verification.

 

Metadata Level dataDictionaryAndProfile

 
Windows

In a Bash terminal on a Windows machine, you can run the program with command-line arguments:

 

# Certificate on a Windows machine and executable is Python
python upload_metadata.py https://sas_viya_url "?filter=startsWith(name,'WATER')&?filter=contains(type,cas)&level=dataDictionaryAndProfile&limit=10" "C:\\Users\\myuser\Downloads\\gelenv_trustedcerts.pem"
 
Filter

The filter uploads only metadata for data assets:

 

  • Where the asset name starts with ‘WATER’.
  • Their type = ‘cas’. This is broader than just a CAS table, it implies a data asset, a table or a file.
  • Limit to 10 (data assets) matching the criteria.
  • With the metadata level dataDictionaryAndProfile.

 

Running the program with the provided parameters will upload the metadata in an in-memory CAS table. In the request body, you can specify the CAS server, the caslib name, 'Public', and a CAS table prefix for the in-memory table, 'Catalog', in this example.

 

data = '''{
    "level": "dataDictionaryAndProfile",
    "prefix": "Catalog",
    "dateTimeStampSuffix": false,
    "serverName": "cas-shared-default",
    "caslibName": "Public"
    }'''
 
Linux

When running the program, a SAS Viya certificate is used, in the form of a PEM file. The PEM file was copied in the 'C:\Users\...\' folder.

 

In a Bash terminal on a Linux machine, the run statement would be:

 

# Certificate on a Linux machine and executable is python3
python3 download_metadata.py https://sas_viya_url "?filter=startsWith(name,'WATER')&?filter=contains(type,cas)&level=dataDictionaryAndProfile&limit=10" /home/cloud-user/.certs/gelenv_trustedcerts.pem

 

A SAS Viya certificate is used here, in the form of a PEM file. The PEM file is assumed to be present in the '/home/myuser/' folder.

 

Output

 

You will see many metrics both at a table and at a column level, such as:

 

  • The calculated classification.
  • If private data was detected.
  • Languages detected.
  • Sentiment detected.
  • Keywords extracted.
  • Classic profiling measures, etc.

 

01_BT_300_Catalog_APIS_Upload_CAS_dataDictionaryAndProfile-1024x576.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

The table metadata, such as keywords, tags, most important columns, privacy and so on, is repeated for each column.

 

Conclusion

 

This program demonstrated how to use the SAS Viya REST API to upload in CAS a table, containing SAS Information Catalog metrics.

 

Acknowledgements

 

  • Nancy Rausch, R&D.  
  • Lavanya Ganesh, R&D.  
  • @AchalPatel , early pioneer of the Upload Catalog API.    

 

Additional Resources

 

You might find the following resources helpful.

 

Read:

 

 

Watch:

 

 

Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
a week ago
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags