BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kparikh
Calcite | Level 5

Hi,

 

We have developed a XML generation job leveraging SAS DI Studio and for faster processing we are using Parallel Process feature on a Loop (set to 8 processes) for creating XML files from a dataset.

 

Upon execution, job created up to 9999 files and hangs after that though the dataset has more than 10000 records. Our operating environment is Linux.

 

Please let me know.

 

Best,

Kaushal

1 ACCEPTED SOLUTION

Accepted Solutions
RLigtenberg
SAS Employee

You may be running into the following. If you select the “Execute iterations in parallel” option, the Loop transformation will generate a log file and an output (.lst) file for each iteration in the location specified for the “Location on host for log and output files” option. These files are named with a default 4-character prefix followed by the iteration number. The prefix can be changed on the Options tab. On that tab it says that the name cannot be longer than 8 characters so a maximum of 9,999 iterations can be achieved with the default prefix. A shorter prefix will allow more iterations.

 

Regards,

Robert

View solution in original post

5 REPLIES 5
thomp7050
Pyrite | Level 9

Kaushal,

 

Have you considered splitting the process and executing the job in multiple increments?  You could do this either through the use of a scripting language (e.g. python, etc.) via the command line, executing a stored procedure on the database via a SAS command, or via SAS' built-in command line executor.  Presumably, your result sets could be stored individually and then combined in a later step.

 

Patrick

Patrick
Opal | Level 21

@thomp7050 "Have you considered splitting the process and executing the job in multiple increments"

That's what @kparikh functionally is already doing using DIS and an inner job run in parallel (not on a Grid as I understand).

 

@kparikh

When running a loop job with "parallel processing" for the inner job then the code generated by DIS will use rsubmit blocks which create new SAS sessions as child processes of your outer job. 

As I understand you the job runs just fine as long as you don't select "parallel processing" as the job then executes everything sequentially in a single session.

 

That the job hangs at number 9999 makes me think that you're eventually hitting some threshold limitting the number of child processes you can create. http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm 

Given the settings you describe there should only be 8 active child processes at a time but not sure if there can also be Linux settings which limit how many child processes a parent process can create in total or in a given amount of time.

 

Do I understand correctly that you've got a source table with 10000 rows and you're creating an XML file per row with each execution of your inner loop job only reading a single row from the source table and creating a single file?

If that's the case then consider a re-design of your job. Each call of the inner job spawns a child process which invokes a brand new SAS session (and that takes time and consumes resources). Consider instead to split up your source table into 8 chunks.

You could for example create a control table for your loop job where you pass in the start and end row to be processed by the inner job (and you then use these values in the inner job for your obs= and firstobs= values). Each call of the 8 inner job processes then an 8th of the rows of the source table - but all together sequentially. You still can run in parallel of course but this way you're only spawning 8 child processes in total and you don't spend the time to read your source table 10000times and to invoke SAS 100000 times.

 

RLigtenberg
SAS Employee

You may be running into the following. If you select the “Execute iterations in parallel” option, the Loop transformation will generate a log file and an output (.lst) file for each iteration in the location specified for the “Location on host for log and output files” option. These files are named with a default 4-character prefix followed by the iteration number. The prefix can be changed on the Options tab. On that tab it says that the name cannot be longer than 8 characters so a maximum of 9,999 iterations can be achieved with the default prefix. A shorter prefix will allow more iterations.

 

Regards,

Robert

Patrick
Opal | Level 21

@RLigtenberg

Oh, that's where the 9999 comes from.

 

@kparikh

I'd still recommend that you re-design your DIS job so that you don't invoke 10000 SAS sessions. Even if SAS invocation takes only 1 second and even if you run 8 sessions in parallel, you'd still spend more than 20 minutes elapsed only for creation of SAS sessions: 10000 seconds / (8*60) = 20.8 minutes.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1574 views
  • 1 like
  • 4 in conversation