Solved: Re: running multiple SAS programs in parallel using SLURM Batch script

raulroy · Posted 09-19-2018 06:13 PM

Is the following an efficient way of running several sas programs in parallel?

Each program uses the same data and filters it by year 2006,2007...ect. and runs independent regression like anslysis on them. The outputs are eported as CSV files with year indexed names.

#!/bin/bash

#SBATCH -t 4:00:00
#SBATCH --job-name=SAS01
#SBATCH -N 1
#SBATCH -n 16
#SBATCH --partition=Bigg


. /etc/profile.d/modules.sh
echo "Job running on SLURM NODELIST: $SLURM_NODELIST "


# Modules needed for this SAS job

module purge

module load SAS

#SAS Program execution command

sas /home/user1/SASprog/prog1.sas -sysparm '2006' -log /home/user1//log/proglog2006.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2007' -log /home/user1//log/proglog2007.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2008' -log /home/user1//log/proglog2008.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2009' -log /home/user1//log/proglog2009.log &

wait

1) Does accessing the same data in parallel create issues?

2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?

3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?

After a few trials, I am unable to successfully run the program. Getting the following error:

ERROR: Insufficient space in file WORK.REG_DEC.DATA.
ERROR: File WORK.REG_DEC.DATA is damaged. I/O processing did not complete.
WARNING: The data set WORK.REG_DEC may be incomplete.  When this step was stopped there were 949663 observations and 178 variables.
WARNING: Data set WORK.REG_DEC was not replaced because this step was stopped.

Getting similar error message in all except one log.

Should I just reach out to the admin of this cluster regarding memory? Or am I doing something fundamentally wrong. Suggestions are appreciated.

Doc_Duke · Posted 09-19-2018 09:01 PM

Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you. If you have access to other disk space, you can have each program point to its own work library. If you can put the different WORK libraries on different spindles, you will have better performance that way too.

There are also some coding practices that lead to creating lots of WORK.data files that consume space. This Google search will get you lots of help there

efficient disk usage site:sas.com

View solution in original post

ChrisNZ · Posted 09-19-2018 06:37 PM

1) Does accessing the same data in parallel create issues?

It makes the disk reads more random, therefore it slows down the program if you only have one SAS table in one location.

It may be faster to run the programs sequentially.

2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?

Correct

3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?

SAS deletes its work libraries when it ends.

You can do intermediate purges run running proc datasets noprint kill;

After a few trials, I am unable to successfully run the program. Getting the following error.

You need more disk space for the WORK library to avoid this error.

The comment on 1) is true here too: the work library will be used with more random accesses when you have more processes using it.

I would run one program, then run 2 concurrently, then 3, to see how run time is impacted, and what volume triggers a full disk space error.

High-Performance SAS Coding - Third Edition

raulroy · Posted 09-19-2018 07:14 PM

Thanks a lot for your quick response. Ignoring issue 1 (which I wish to take care of by splitting the original data file into yearly files) is this a decent method of parallel processing sas programs?

ChrisNZ · Posted 09-19-2018 07:23 PM

1- You haven't solved any disk access issue if all the "split" tables are in the same location.

2- It's been years since I used Unix, but @Kurt_Bremser would be able to comment on the script.

Another method is to manage everything from within a SAS session using MP Connect.

High-Performance SAS Coding - Third Edition

Doc_Duke · Posted 09-19-2018 09:01 PM

Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you. If you have access to other disk space, you can have each program point to its own work library. If you can put the different WORK libraries on different spindles, you will have better performance that way too.

There are also some coding practices that lead to creating lots of WORK.data files that consume space. This Google search will get you lots of help there

efficient disk usage site:sas.com

running multiple SAS programs in parallel using SLURM Batch script

Re: running multiple SAS programs in parallel using SLURM Batch script

Re: running multiple SAS programs in parallel using SLURM Batch script

Re: running multiple SAS programs in parallel using SLURM Batch script

Re: running multiple SAS programs in parallel using SLURM Batch script

Re: running multiple SAS programs in parallel using SLURM Batch script

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!