BookmarkSubscribeRSS Feed

Troubleshooting "over spilled" temporary space for SAS9 Runtime (SPRE) on Viya

Started ‎11-22-2019 by
Modified ‎11-22-2019 by
Views 4,737



As a Technical Account Manager working in SAS Singapore, I must sometimes step down from shaping technology visions for my customers, roll-up my sleeves and help them overcome technical challenges preventing them from efficient use of their SAS Viya platform. Recently I was alarmed by one of my customers claiming their system is unresponsive, hence users cannot log in into their SAS IDE (StudioV). The post below maps how the root cause analyses have been done and clarifies what "killed" the system, including what was done by whom, when and why.




Shortly, customer shared the details why the alarm has been raised and after a quick examination of the alarm message we found out that CAS_DISK (CAS Disk Cache) has run out of space. As we recently moved SAS9 working area SAS_TMP (Workspace, SPRE, SAS9 runtime whatever you call it...) from a "sluggish" disk to more IO-rich CAS_DISK device, there were only two possible causes for the disk to run out of space:


  1. SAS9 program generated a huge amount of temporary data.
  2. Too much data has been loaded into the in-memory area of SAS Viya (CAS). Note that Viya stores a copy of the file/table on the disk.

Due to the recent move of SAS_TMP, my intuition told me to examine the potential cause no 1 - SAS9 programs. I asked the customer to check the disk space utilization. Note that commands used can be found in the screenshots itself.

The picture below confirms that CAS_DISK volume ran out of space.




sas_tmp folder used as a working area for SAS9 programs attributes to the majority of the space utilization.




And below is a file that took all the free space. As I will show later, the filename is worth remembering as we will use it for subsequent log forensics trying to identify which user process and SAS code "killed" the system.




If you need to locate the work files you can do so with the command below:

find /opt/sas -name "*SAS_work*"

I took the file name and went to the folder with Workspace (SAS9) logs...

cd /var/log/sas/viya/compsrv/default

What is of my interest are the logs containing all the commands run by the user. This is how I list only these logs:

 ll *.pgm.log

These are the type of logs I am interested in...




What I will do now is, I will scan the contents of the logs and search for the large temporary file...remember? "SAS_workBEB5..." Here is how I narrow the search to the date/time when the incident occurred.

grep -i 'sas_WorkBEB' *2019-11-14*



Gotcha!! So what I have learned?


  1. I know the file containing the code that created this "giant" file so I can later see the code that caused this.
  2. And I also know process ID: 44447, which will help me to identify the actual user who "fired" the command.

1. Analyzing the script

The log file contains a few interesting things - actual code, CPU time, elapsed time, result rows, errors, etc. By analyzing the log with the data scientist, we found out there was a product join that caused the creation of a huge temp file. When the data scientist saw the result rows of the individual queries he immediately knew something was wrong with the join condition...or the underlying data.




2. Identifying the user behind process PID (44447)

By running the command below I should be able to see the process (PID) which user invoked by running his query. If needed, I could kill the process if the system completely froze.

ps -ef | grep 44447

You should see output like this: Note I have different PID (7734) as I don't have access to the customer environment rather experiment in my sandbox.



(Click image to enlarge)


SAS has a utility to monitor the environment from various perspectives - the disk, machine, job, etc. Why do I mention this tool? Because it lets you analyze what users are doing, how they utilize server resources and eventually kill their inefficient jobs. Don't forget - "With power comes responsibility." When I tried to run the utility I got following error:




After some digging, I found out the issue is that I don't have a passwordless SSH enabled to all nodes. You are lucky today...this is how to enable it:


  1. Run this command to create a passwordless SSH key pair and store it in your home directory.
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
  1. Copy the public key to the authorized_keys file.
cat ~/.ssh/ >> ~/.ssh/authorized_keys
  1. Ensure that only the user has access to the authorized_keys file.
chmod 600 ~/.ssh/authorized_keys
  1. Let's test the setup. What we want to see here is the ssh connection established without a prompt for the password.

Note: Later I found out that shows only the CAS engine sessions and not the SAS9 (Workspace) my mission to identify the actual user from PID continues...

Warning: Never ask your customer to enable passwordless ssh for root - they might question your competency. SAS asks IT to enable passwordless ssh for a standard user e.g. viyadep that is typically used by Ansible Controller.

"Post Mortem" Cleaning

As the process was still holding the large temp file, we decided to kill it to allow the removal of the file. Admins removed the file but later I found out there is a more elegant way to do this. SAS has a special command to be used for cleaning of temp files that are not used by any process. Don't be shy to Google and read a manual for Cleanwork command. Hmm, but where is that utility???


find /opt -name "*cleanwork*"

...and here it is:

cd /opt/sas/spre/home/SASFoundation/utilities/bin/

For how-to-use the command, please read the manual.


Checking status of services

While "hacking" my sandbox, somehow I destroyed "something" and was not able to logon via StudioV. Then I discovered a useful command that returns a list of all Viya services and their status. Using this command I have found out that one of the services is I will try to restart it.


cd /etc/init.d
./sas-viya-all-services status

To check the health of Viya services:




Now I know which service to start:

systemctl start sas-viya-cascontroller-default

...and check the status of service again. Nope...didn't help.




As I was not patient enough to analyze logs, I decided to take a shortcut and restart all Viya services...and I resolved this issue.

service sas-viya-all-services stop
service sas-viya-all-services start



While I was successful in identifying why and when the system has been "killed" I failed to identify who ran the "killer" script...

In order to link the actual user with the PID, temp and log files, PAM/LDAP authentication must be enabled within SAS. This ensures that the workspace process is run under the user account on the OS level instead of the service account that is common for all users.

Another way how to identify the user is to analyze the AUDIT table and tries to relate the login event with the date/time of the work being run in Viya. However, this process may not bring needed results as many users may have logged at the same or similar time...


Hi, great article, thanks for sharing the story.

What is not entirely clear to me is why a SAS 9 program fills the CAS DISK Cache?

proc sql is executed in SPRE. The created table was loaded in CAS?

Hi Bogdan, CAS_DISK in this context is the name of the disk where SAS9 / SPRE workspace is located it's NOT "CAS DISK CACHE" as CAS workspace. Thanks for the question!

Version history
Last update:
‎11-22-2019 03:27 PM
Updated by:



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags