BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
raulroy
Calcite | Level 5

Is the following an efficient way of running several sas programs in parallel? 

Each program uses the same data and filters it by year 2006,2007...ect. and runs independent regression like anslysis on them. The outputs are eported as CSV files with year indexed names. 

 

 

 

#!/bin/bash

#SBATCH -t 4:00:00
#SBATCH --job-name=SAS01
#SBATCH -N 1
#SBATCH -n 16
#SBATCH --partition=Bigg


. /etc/profile.d/modules.sh
echo "Job running on SLURM NODELIST: $SLURM_NODELIST "


# Modules needed for this SAS job

module purge

module load SAS

#SAS Program execution command

sas /home/user1/SASprog/prog1.sas -sysparm '2006' -log /home/user1//log/proglog2006.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2007' -log /home/user1//log/proglog2007.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2008' -log /home/user1//log/proglog2008.log &
sas /home/user1/SASprog/prog1.sas -sysparm '2009' -log /home/user1//log/proglog2009.log &

wait

1) Does accessing the same data in parallel create issues?

 

2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?

 

3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?

 

After a few trials, I am unable to successfully run the program. Getting the following error:

 

ERROR: Insufficient space in file WORK.REG_DEC.DATA.
ERROR: File WORK.REG_DEC.DATA is damaged. I/O processing did not complete.
WARNING: The data set WORK.REG_DEC may be incomplete.  When this step was stopped there were 949663 observations and 178 variables.
WARNING: Data set WORK.REG_DEC was not replaced because this step was stopped.

 

 

Getting similar error message in all except one log.

 

Should I just reach out to the admin of this cluster regarding memory? Or am I doing something fundamentally wrong. Suggestions are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Doc_Duke
Rhodochrosite | Level 12

Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you.  If you have access to other disk space, you can have each program point to its own work library.  If you can put the different WORK libraries on different spindles, you will have better performance that way too.

 

There are also some coding practices that lead to creating lots of WORK.data files that consume space.  This Google search will get you lots of help there

efficient disk usage site:sas.com

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

1) Does accessing the same data in parallel create issues?

It makes the disk reads more random, therefore it slows down the program if you only have one SAS table in one location.

It may be faster to run the programs sequentially.

 

2) My understanding is each program will invoke a separate sas session. This should not create conflict in terms of the Work library right?

Correct 

 

3) Is there a way to explicitly purge the work libraries at the end of the program? Could there be dumps of earlier work library stashed somewhere I am not seeing, which in turn might be impacting subsequent memory use?

SAS deletes its work libraries when it ends.

You can do intermediate purges run running proc datasets noprint kill; 

 

After a few trials, I am unable to successfully run the program. Getting the following error.

You need more disk space for the WORK library to avoid this error.

The comment on 1) is true here too: the work library will be used with more random accesses when you have more processes using it.

 

I would run one program, then run 2 concurrently, then 3, to see how run time is impacted, and what volume triggers a full disk space error.

 

raulroy
Calcite | Level 5

Thanks a lot for your quick response. Ignoring issue 1 (which I wish to take care of by splitting the original data file into yearly files) is this a decent method of parallel processing sas programs?

ChrisNZ
Tourmaline | Level 20

1- You haven't solved any disk access issue if all the "split" tables are in the same location.

 

2- It's been years since I used Unix, but @Kurt_Bremser would be able to  comment on the script.

Another method is to manage everything from within a SAS session using MP Connect.

 

Doc_Duke
Rhodochrosite | Level 12

Sometimes running out of WORK space in Unix is a side effect of multiple users running jobs at the same time and your SAS Admin may not be able to expand the WORK library for you.  If you have access to other disk space, you can have each program point to its own work library.  If you can put the different WORK libraries on different spindles, you will have better performance that way too.

 

There are also some coding practices that lead to creating lots of WORK.data files that consume space.  This Google search will get you lots of help there

efficient disk usage site:sas.com

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2956 views
  • 0 likes
  • 3 in conversation