In my previous article, I explained how you can display PDF files inside a SAS Visual Analytics report. That request came from a customer who was willing to load more 10,000 files into the SAS Content Server. They had specific requirements about securing the files and this is the reason why they chose to upload the files and the folder structure into the SAS Content Server and not use alternative options like a standalone container serving the files. This article will not be about the reasons why they chose to upload the files into the SAS Content Server, but rather about how to do it. The approach that we chose is: Python code to automate the upload process.
As mentioned, the customer has more than 10,000 files to upload and these files are stored in a specific folder structure. They expected to replicate the folder structure they have at OS level to store the files into the SAS Content Server. The customer has Python knowledge and chose that option as it is easy to navigate the file system in Python and easy to call REST APIs. Uploading the files to the SAS Content Server and creating the folder structure can be achieved using the REST APIs surfaced by SAS Viya. SAS has REST APIs for files and folders.
The customer wanted a quick solution as the upload process would be a one-time job and there will be no need to upload so many files in the future. This requirement for a fast solution had an impact on the decision to use sasctl package for Python.
If you are a data scientist developing models in Python and loading these models into SAS Model Manager, you may already know about the sasctl package for Python. The package was mainly designed to load open-source models into SAS Viya and to handle their deployment into production using the Python language and SAS Viya REST APIs. To serve that purpose, the developers had to create "services" which can be used for model management related tasks, but also for core tasks like handling files and folders as well as authentication. This means that if you are Python developer, this package reduces the time needed to develop a solution as many of the SAS Viya REST APIs endpoints are implemented. And if a specific endpoint is not available, the package also provides a technique to make direct calls to SAS Viya REST APIs which are not defined as a service.
Here is a list of the available services:
Using the sasctl package considerably reduced the time to load the 10,000 plus files and folders structure in just a few lines of code.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
A bit of explanation might be required about the code.
In the first few lines below, the different components are imported.
The next lines are used to define the configuration properties to connect to the environment (lines 7 to 9).
The following code block defines a function which will parse the specified OS folder and create two lists: folders_list and files_list.
For this specific code, the function will only retrieve the files if they end with ".pdf" as can be seen on line 39.
The d_list array contains a list of folders which should be created into SAS Content Server.
The f_list array contains a list of file information mapping containing the source file path and the target folder location.
So far, the code we have seen doesn't perform any action. It just defines a few parameters and a function. The following is the glue around the different components:
In line 51, we create a Session for the user. In that session, we create the two lists for folders and files. Using the folder list, we call the folders service to create the folders in SAS Viya. When done the files service is called to create the files in the SAS Content Server using the information from the file_list.
While the sasctl package contains a rich set of functionalities, you can extend it with your own contribution. The package is available on GitHub. You can contribute to it and bring extra functionalities like I did with create_path method. When I started helping this customer, there was no function to create a folder in the SAS Content Server if the parent folder wasn't already created. The create_path method implements that functionality a bit like the following command under Linux:
mkdir -p /build/complete/folder/path/newFolder
The code for the method is the following:
If you want to contribute to the project, you follow the instructions provided in the project.
Using packages like sasctl is advantageous as it reduces the time to results and avoids that you, as a developer, reinvent the wheel. The package brings a lot of functionalities that you are not only relevant to model management. The provided services ease the development, but also helps you understand how to interact with SAS Viya REST APIs. Wrapper functions likes the Session is an elegant way to authenticate the user. It allows different authentication mechanism like username/password as we have seen here, and it also implements authinfo based authentication mechanism.
The fact that you can contribute to this project is another benefit. You should not be afraid to contribute for different reasons:
Most importantly, the sasctl project is yours. You can make what you want of it!
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.