BookmarkSubscribeRSS Feed

Implementing File System Security for SAS Viya

Started 7 hours ago by
Modified 7 hours ago by
Views 62

This article explains the principles of file system security for teams, projects or departments and their data on a shared file system used with SAS Viya 4. It explains how to segregate authorization to data for reading and writing within a team and beyond. It is built on classic Unix file security mechanisms (POSIX) and how Viya treat users and groups. It can be applied to POSIX compliant filesystems such as NFS (incl. Azure NetApp Files), Lustre, CEPH and GPFS. It also explains a practical approach to managing file system permissions with SAS Viya CLI that do not require Kubernetes API permissions.

Introduction

When working with SAS as an analyst in a team it is a common need to save prepared data so your teammates can use it for reporting, ad-hoc analysis or model training. For a typical SAS programmer, it is easier to let SAS auto-create output tables as files (.sas7bdat, parquet etc.), instead of writing explicit SQL DDL statements to create and define output tables in an external database. For this reason, new customers on Viya, and existing SAS customers migrating to Viya, often require a shared filesystem with a secure folder structure for reading and writing prepared data.

Example of a secure folder structure with two teams

Requirements met in this example:

1) Teams should have a folder structure where only the team’s own members can read and write data and create sub directories.
2) When a team member creates new directories and files, they must inherit the permissions of their parent directory.
3) Some users need read-only access to specific other team’s data
4) Some users are members of multiple teams.

This example has two teams: Sales and Finance. An administrator has created a batch user account “dataadmin” that is needed when creating new team folders.

OS Directory Owner Group Permission Attributes:

 

owner group other*

Notes
/viya-share/sasdata dataadmin sasdata rwx r-x — Every new team should get a top folder and a data subfolder like those shown for sales and finance here.
/viya-share/sasdata/sales dataadmin sales-reader rwx r-x — Only sales-reader can access this folder and below
/viya-share/sasdata/sales/data dataadmin sales-readwriter rwx rws r-x Read-only access is granted for sales-reader through the read-only access for “other”, and sales-reader group on the folder above.

 

When sales-readwriter members creates sub directories in SAS Studio or other tool, they inherit the sales-readwriter group because of the “s” permission attribute on group*.

/viya-share/sasdata/finance dataadmin finance-reader rwx r-x —  
/viya-share/sasdata/finance/data dataadmin finance-readwriter rwx rws r-x  

* Attribute abbreviations: d=directory, r=read, w=write, x=execute file or list directory, s=automatically set group id on child files and folders from parent directory.
s also implies x (execute file or list directory).

Example users of the above directory structure

User Group Memberships
anita sasdata, sales-reader, sales-readwriter
hugh sasdata, finance-reader, finance-readwriter, sales-reader
hans sasdata, finance-reader, finance-readwriter, sales-reader, sales-writer
dataadmin sasdata, finance-reader, finance-readwriter, sales-reader, sales-writer

 

(all groups used on the shared filesystem)*

* dataadmin can be a service-user or batch user only used for managing file system security.

 

Umask* = 002

Linux permission attributes (such as rwx) translate to a four digit number. For example 0777 indicates a file with rwx rwx rwx attributes. A Linux process’ current umask removes (masks) permissions when creating new files and directories. The basis for new files and directories is 0777, which means read, write and execute for owner, group and other. Umask 0002 removes write permissions for “other”, by deducting two bits from the last octet. The result for new files is therefore 0775 (rwx rwx rw-).

For an explanation of Linux permissions and numbers, refer to Unix / Linux – File Permission / Access Modes.

Set group id

You might wonder why I use rws instead of rwx for group permissions in the example above. This refers to the “set group id” feature of POSIX filesystems.

The “s” instead of x means that new files and subdirectories will inherit the group from the directory. This is a key feature when a group of users are working in the same directory, because it allows users to be able to read and write files that other group members have produced. If “set group id” is not used, then new files and subdirectories will instead get the creator-users primary group. The primary group for Active Directory users is by default just “domain users” meaning all authenticated users. So to avoid giving access to all authenticated users on new files, it is important to use the set group id feature.

In octal notation, this bit is set with the number 2 in front. This means that the security combination rwx, rws, rwx translates to 2777, instead of just 0777.

Benefits of applying security to filesystem

Filesystem permissions take effect across most Viya analytical engines including:

1) SAS Compute (incl. Enhanced Compute Engine and DuckDB)
2) CAS*
3) Python
4) R

This means users can switch freely between engines, or one user can use Python while another use SAS Compute, and authorization to data files will be transparent.

* CAS by default access files with a fixed user “sas” instead of the end-user. For SAS environments where both CAS and other engines are used in the same team, I recommend configuring CAS to access files as the end user themselves with the CASALLHOSTACCOUNTS setting. Thereby CAS can be used to read and write files from a common folder structure like the one seen above.

Enabling Group IDs in SAS Viya

POSIX attributes like user id number (UID) and group id number (GID), and secondary GID are vital elements of the above security model, because Unix just stores user and group information as numbers in the file systems internal metadata. POSIX attributes are not available from all types of external identity providers to Viya. For example, a SCIM based identity provider cannot supply UID and GID numbers.
By default, SAS Viya Platform 2023.04 and later releases provides a generated user id (UID) and allows a SAS Administrator to provide group id (GID) values through REST API or with Viya CLI. Viya can also be configured to generate GID values. For details see:
SAS Viya and POSIX attributes (UID and GID).

How to Set Directory Permissions on a Shared Filesystem

Prerequisites:
A. SAS Compute, SAS Batch and SAS CAS pod templates have been configured to mount the shared file system.
B. The groups and the dataadmin account you want to use have been created and loaded into SAS Viya.
C. If you have direct sudo access to the shared file system’s server
    • perform step 1 below to obtain GID numbers.
    • create the directories with mkdir,
    • change groups with chgrp.
    • and set permission attributes with chmod
      Skip the rest of the steps here.
D. You have installed sas-viya cli and its batch and identities plugin in a version that matches your Viya platform.
E. You are signed in to sas-viya cli as dataadmin (or similar service user that will be used for managing your file system)
F. You have modified the rule that allows dataadmin (or a group that it is member of – e.g. SAS Administrators) to run commands with the batch plugin. See SAS Help Center: Set Up the batch Plug-in
 
jan_ehlers_3-1761141123252.png

Figure 1: Editing one of the rules to grant dataadmin permissions to create batch jobs.

 

Approach:

  1. Obtain the GID numbers for the groups because you will set these numbers on the directories.
a) Log in to SAS Viya in a browser.
b) Replace the browser location with this uri (replace values in <>):
<your-hostname>/identities/groups/<group name>/identifier
Take note of the GID number. On screenshot below it is 1676916821:
 
 
jan_ehlers_4-1761141185294.png

Figure 2: Obtaining GID number for sasdata group.

 
If you are member of SAS Administrators group, you can alternatively obtain GID numbers with sas-viya cli like this:
sas-viya -k --output json identities show-group --id sasdata --show-advanced
{
    “creationTimeStamp”: “”,
    “description”: “”,
    “gid”: 1676916821,
    “id”: “sasdata”,
    “modifiedTimeStamp”: “”,
    “name”: “sasdata”,
    “providerId”: “ldap”,
    “state”: “active”
}
 

2. We will use viya-cli batch to start a SAS Compute pod and submit Linux commands to create directories and set permissions. This do not require SAS Administrator membership – just read / write access to the shared file system directory that you want to use as the starting point for the folder structure (e.g. /viya-share/).

a. Validate that the shared file system is mounted in the pod:
sas-viya batch jobs submit-cmd --context default-cmd --cmd "df" --wait --watch-output
look in the “Mounted on” to find the location to create the sasdata directory. This is an example from my environment, highlighting the mount point:
192.168.2.4:/export/sas-viya/data  395191296  38588416  336454656  11% /viya-share/
 
b. Create directories:
./sas-viya -k batch jobs submit-cmd --context default-cmd --cmd "mkdir /viya-share/sasdata" --wait --watch-output
Repeat creation of the other directories in the security model.
 
c. Change group on the directory using the gid numbers obtained above:
./sas-viya -k batch jobs submit-cmd --context default-cmd --cmd "chgrp 1676916821 /viya-share/sasdata " --wait --watch-output
 
d. Change permissions on the directory using chmod:
./sas-viya -k batch jobs submit-cmd --context default-cmd --cmd "chmod 2750 /viya-share/sasdata " --wait --watch-output
Repeat change group of the other directories in the security model with the corresponding groups.
 
Note: If you have more than four directories or teams, a scripted version of the above commands is handy. The same is true if you have more than one SAS Environment. The viya-cli batch command can submit a bash script file to the batch pod with the –job-file option.
 

Troubleshooting

  • A non-root user can only change membership to groups he / she is a member of. If chgrp fails with “operation not permitted” and your user has recently been added to the group that fails, you need to sign out of the Viya CLI  and sign in again to refresh you auth token. Then all current groups are applied to your SAS Compute pods.
  • To check which groups your user is currently added to inside the cmd batch pod, you can run this command:
  • sas-viya -k batch jobs submit-cmd –context default-cmd –cmd “id” –wait –watch-output
  • If you cannot see the shared file system mount point inside the cmd batch pod, and only in SAS Compute pods, speak with you Kubernetes administrator. He or she must ensure that a patch is applied to the cmd batch job template, so the filesystem is mounted.
  • Some Kubernetes storage drivers (CSI drivers) will systematically change ownership of files and directories to GID 1001 when a Viya pod attempts to mount the filesystem. I have seen this when changing to a new version of a HNAS CSI driver on OpenShift. Similarly, this post describes the same behavior with version 0.1.18 of Azure Managed Lustre and gives a solution to add fsGroupPolicy: None to the CSI drivers yaml file. Warning: Update to Azure Lustre CSI driver may result in group ownersh… – SAS Support Communities

Prevent Circumvention of POSIX Permissions on a Shared File System

To prevent circumvention of file system permissions the file system service and its network boundaries should be configured with the following in mind:
1) Non authorized clients must be prevented from mounting the file system. This can be ensured with one or more of these techniques:
a. Only allow Viya Kubernetes cluster node IP-addresses to connect to the file system’s network endpoint
b. Require a client secret when mounting the file system
c. Encrypt the file system with a customer managed key that must be used by the client for reading or writing / deleting data
2) Limit the Kubernetes cluster’s users’ ability to publish and deploy custom containers (where they are in control of UID and GID) if custom containers can mount the shared filesystem.
 

16 Group Limit in NFS

The NFS protocol is often used for mounting shared filesystems from Network Attached Storage, NFS-servers and cloud storage such as Azure Files, Azure NetApp Files and AWS Elastic File System. By default, an NFS-client, such as a SAS pod, transfers the current user’s list of group membership to the NFS-server. However, the NFS protocol used for communication between NFS-client and NFS-server without Kerberos allows transmission of maximum 16 groups. This means that if a user is a member of more than 16 groups, the protocol only transmits the user’s first 16 groups, and the NFS-server will deny access to a directory, if the relevant group for accessing the directory is omitted by the client.

At least four options exist for overcoming this limitation:

1) Design a simple security model with less than 16 groups per user. Be aware that if Viya is configured to generate group id’s (gid) then both external groups and custom (internal) groups in Viya will have GID’s and count against the limit.
2) Integrate your NFS server or NFS cloud service with the Identity Provider that is also used by SAS Viya and enable it to look up groups for a user. Thereby it can obtain all groups for a user. I have seen examples of this with: Linux NFS-Server, Hitachi NAS, and Azure Netapp Files
3) Use a combination of group on directory as shown above, and secondary group or user permissions set with Access Control List (ACL) to reduce the total number of groups needed. Use ACL’s sparingly as the authorization model easily becomes complex. Note that not all NFS implementations support ACL.
4) Use a more advanced file system without a 16 group limit such as Lustre. Lustre also performs better than NFS with SAS workloads. Lustre can be bought as a managed filesystem with full Kubernetes support if your SAS environment is situated on Amazon or Azure: Amazon Fsx for Lustre, and Azure Managed Lustre. I have not tested group limits for Ceph with Red Hat OpenStack Data Foundation with the Ceph protocol but would be glad to hear any results if you have.

Other Useful links

Contributors
Version history
Last update:
7 hours ago
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags