Is the following an efficient way of running several sas programs in parallel?
Each program uses the same data and filters it by year 2006,2007...ect. and runs independent regression like anslysis on them. The outputs are eported as CSV files with year indexed names.
#!/bin/bash
#SBATCH -t 4:00:00
#SBATCH --job-name=SAS01
#SBATCH -N 1
#SBATCH -n 16
#SBATCH --partition=Bigg
. /etc/profile.d/modules.sh
echo "Job running on SLURM NODELIST: $SLURM_NODELIST "
# Modules needed for this SAS job
module purge
module load SAS
#SAS Program execution command
sas /home/user1/SASprog/prog1.sas -sysparm '2006' -log /home/user1//log/proglog2006.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2007' -log /home/user1//log/proglog2007.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2008' -log /home/user1//log/proglog2008.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2009' -log /home/user1//log/proglog2009.log &
wait
1) Does accessing the same data in parallel create issues?
2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?
3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?
After a few trials, I am unable to successfully run the program. Getting the following error:
ERROR: Insufficient space in file WORK.REG_DEC.DATA.
ERROR: File WORK.REG_DEC.DATA is damaged. I/O processing did not complete.
WARNING: The data set WORK.REG_DEC may be incomplete. When this step was stopped there were 949663 observations and 178 variables.
WARNING: Data set WORK.REG_DEC was not replaced because this step was stopped.
Getting similar error message in all except one log.
Should I just reach out to the admin of this cluster regarding memory? Or am I doing something fundamentally wrong. Suggestions are appreciated.
Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you. If you have access to other disk space, you can have each program point to its own work library. If you can put the different WORK libraries on different spindles, you will have better performance that way too.
There are also some coding practices that lead to creating lots of WORK.data files that consume space. This Google search will get you lots of help there
efficient disk usage site:sas.com
1) Does accessing the same data in parallel create issues?
It makes the disk reads more random, therefore it slows down the program if you only have one SAS table in one location.
It may be faster to run the programs sequentially.
2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?
Correct
3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?
SAS deletes its work libraries when it ends.
You can do intermediate purges run running proc datasets noprint kill;
After a few trials, I am unable to successfully run the program. Getting the following error.
You need more disk space for the WORK library to avoid this error.
The comment on 1) is true here too: the work library will be used with more random accesses when you have more processes using it.
I would run one program, then run 2 concurrently, then 3, to see how run time is impacted, and what volume triggers a full disk space error.
Thanks a lot for your quick response. Ignoring issue 1 (which I wish to take care of by splitting the original data file into yearly files) is this a decent method of parallel processing sas programs?
1- You haven't solved any disk access issue if all the "split" tables are in the same location.
2- It's been years since I used Unix, but @Kurt_Bremser would be able to comment on the script.
Another method is to manage everything from within a SAS session using MP Connect.
Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you. If you have access to other disk space, you can have each program point to its own work library. If you can put the different WORK libraries on different spindles, you will have better performance that way too.
There are also some coding practices that lead to creating lots of WORK.data files that consume space. This Google search will get you lots of help there
efficient disk usage site:sas.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.