Architecting, installing and maintaining your SAS environment

running SAS in parallel using SLURM job arrays

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

running SAS in parallel using SLURM job arrays

Has anyone ever succeeded in running SAS in parallel on a SLURM-based Linux cluster computer using what are known as "job arrays"? 

 

https://rcc.uchicago.edu/docs/running-jobs/array/index.html

 

In particular, I need to pass a SLURM environment variable SLURM_ARRAY_TASK_ID from my batch shell script to SAS.  This task has completely defeated me.   

 

Google searches haven't helped.  I see mention of multi-threading, but that's not quite the same.

 

I would be so appreciative for some expert help!


Accepted Solutions
Solution
a week ago
Super User
Posts: 7,447

Re: running SAS in parallel using SLURM job arrays

Either add

export SLURM_ARRAY_TASK_ID

to your shell script before starting SAS, or use the -sysparm option to hand SLURM_ARRAY_TASK_ID over on the SAS commandline.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 3,238

Re: running SAS in parallel using SLURM job arrays

I don't know anything about SLURM but the SYSGET SAS function reads OS environment variables.

 

Also SAS jobs can be run in parallel using SAS Grid or SAS/CONNECT technology.

Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Unfortunately the SAS command

 

%let SLURM_ARRAY_TASK_ID=%SYSGET(SLURM_ARRAY_TASK_ID);

 

does not seem to work.  I'm going to guess that SLURM_ARRAY_TASK_ID isn't an "environment variable" in the same sense you mean, but I could be wrong.  Thanks for your reply!

 

 

Super User
Posts: 3,238

Re: running SAS in parallel using SLURM job arrays

If you could get your shell script to also write it to a "proper" OS environment variable as well that might work. Someone who is more familiar with Unix should be able to help.

Solution
a week ago
Super User
Posts: 7,447

Re: running SAS in parallel using SLURM job arrays

Either add

export SLURM_ARRAY_TASK_ID

to your shell script before starting SAS, or use the -sysparm option to hand SLURM_ARRAY_TASK_ID over on the SAS commandline.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Thanks for your suggestions!  I've tried many variations, including the lines:

 

#!/bin/bash

#SBATCH --job-name=job_array
#SBATCH --output=job_array_%a_out.txt
#SBATCH --error=job_array_%a_err.txt
#SBATCH --array=1-3

echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
export SLURM_ARRAY_TASK_ID
sas -noterminal job_array.sas -sysparm $SLURM_ARRAY_TASK_ID

 

in my shell script and the lines


%let TskID=%scan(&sysparm,1);
%put &TskID;

 

in my SAS program.  Regrettably the value of TskID seems to be empty; the integers 1, 2, 3 seem not be passing into SAS.  More help is definitely needed and appreciated!

Super User
Posts: 7,447

Re: running SAS in parallel using SLURM job arrays

With the export in place, did you try

%let tskid=%sysget(SLURM_ARRAY_TASK_ID);

?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Yes.  With the export in place, the SAS lines:


%let TskIDa=%SYSGET(SLURM_ARRAY_TASK_ID);
%put &TskIDa;

 

unfortunately give empty TskIDa.  Any other ideas?  I'm grateful for your help.  If passing the SLURM_ARRAY_TASK_ID parameter from shell script to SAS is impossible, then this is a disappointment.  I don't wish to give up quite yet, but I'm at a loss what to try next.

 

Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

-sysparm $SLURM_ARRAY_TASK_ID

was definitely the light at the end of the tunnel: thanks!
Super User
Super User
Posts: 6,848

Re: running SAS in parallel using SLURM job arrays

Are you sure you have a standard SAS command line script and not some local script that is "eating" your command line arguments?

Try just passing in any text to the SYSPARM option.  So make a trivial SAS program.

%put &=sysparm;

Then call it with the -sysparm option.

sas -sysparm fred myprogram.sas

In the log you should see

SYSPARM=fred

If that works then replace fred in the command string with the reference to your environment variable.

 

Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Thank you.  The command in my shell script:

 

sas -noterminal trivial.sas -sysparm $SLURM_ARRAY_TASK_ID

 

plus your suggested command in my SAS program:

 

%put &=sysparm;

 

indeed gives me an integer (1, 2 or 3).  Thus something is wrong with my SAS commands:

 

%let TskID=%scan(&sysparm,1);
%put &TskID;

 

which give TskID as empty.  How should I correct these? I need, of course, to use TskID later in my program. 

Super User
Super User
Posts: 6,848

Re: running SAS in parallel using SLURM job arrays

If you want to just assume that the caller has passed the proper value in SYSPARM then just remove the %SCAN() function call.
%let tskid=&sysparm;
Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Unfortunately my attempt to memorize the sysparm value:

 

%let TskID=&sysparm;
%put &TskID;

 

still appears to be empty.  Why should your command:

 

%put &=sysparm;

 

work so beautifully but not mine?  I appreciate your suggestions!

 

Super User
Super User
Posts: 6,848

Re: running SAS in parallel using SLURM job arrays

This simple program should work.

1    %let TskID=&sysparm;
2    %put &TskID;
fred

There is no reason that %SCAN() shouldn't work either.  You could check to see if SYSPARM has gotten any strange binary characters into it that are confusing SAS into thinking it has macro quoting. Try a simple data step and see it if helps. So you might use SYMGET() to pull the value SYSPARM and then use PUT with $HEX format to see if there are any strange characters in the first 25 bytes.  You could then use CALL SYMPUTX() to generate your new macro variable.

data _null_;
  length tskid $200 ;
  tskid=symget('sysparm');
  put tskid = / tskid $hex50. ;
  call symputx('tskid',tskid);
run;

If it still doesn't work then either your shell script or your SAS code is messing up somewhere.

 

Perhaps there is something in your larger shell program that is causing the -sysparm option to not be properly set on the command line.  Perhaps somehow the environment variable reference is being deferred to some later time where it evaluates to empty? Or these is some other line in the code that is clearing the value before the call to SAS? 

 

Same thing for the SAS side. Do you have other statements that are changing TSKID or SYSPARM? Are you sure you are not creating TSKID in a local symbol table and then looking for it in the global symbol table?

Occasional Contributor
Posts: 14

Re: running SAS in parallel using SLURM job arrays

Thanks for all your thoughts.  I implemented your data step and obtained the following printout in the log file:

 

tskid=3
33202020202020202020202020202020202020202020202020

 

What is meaning of the 50-character string?  It looks ominous. 

 

SLURM_ARRAY_TASK_ID is analogous to a do loop variable, covering the integers {1, 2, 3}, however, all at once.

 

I said the log file, but in fact I have three copies of SAS running in parallel.  Does the difficulty reside in the fact that outcomes for three simultaneous processes are being collapsed into just one file?  Maybe collisions are making this unworkable.

 

This is why I need TskID -- to distinguish between the processes -- but I don't know how to introduce it into SAS's naming of log files, etc., to prevent the collisions from taking place. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 378 views
  • 0 likes
  • 4 in conversation