SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

[ Edited ]

Hi,

 

We have developed a XML generation job leveraging SAS DI Studio and for faster processing we are using Parallel Process feature on a Loop (set to 8 processes) for creating XML files from a dataset.

 

Upon execution, job created up to 9999 files and hangs after that though the dataset has more than 10000 records. Our operating environment is Linux.

 

Please let me know.

 

Best,

Kaushal


Accepted Solutions
Solution
‎05-03-2017 04:27 PM
SAS Employee
Posts: 11

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

You may be running into the following. If you select the “Execute iterations in parallel” option, the Loop transformation will generate a log file and an output (.lst) file for each iteration in the location specified for the “Location on host for log and output files” option. These files are named with a default 4-character prefix followed by the iteration number. The prefix can be changed on the Options tab. On that tab it says that the name cannot be longer than 8 characters so a maximum of 9,999 iterations can be achieved with the default prefix. A shorter prefix will allow more iterations.

 

Regards,

Robert

View solution in original post


All Replies
Frequent Contributor
Posts: 93

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

Kaushal,

 

Have you considered splitting the process and executing the job in multiple increments?  You could do this either through the use of a scripting language (e.g. python, etc.) via the command line, executing a stored procedure on the database via a SAS command, or via SAS' built-in command line executor.  Presumably, your result sets could be stored individually and then combined in a later step.

 

Patrick

Respected Advisor
Posts: 4,173

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

[ Edited ]
Posted in reply to thomp7050

@thomp7050 "Have you considered splitting the process and executing the job in multiple increments"

That's what @kparikh functionally is already doing using DIS and an inner job run in parallel (not on a Grid as I understand).

 

@kparikh

When running a loop job with "parallel processing" for the inner job then the code generated by DIS will use rsubmit blocks which create new SAS sessions as child processes of your outer job. 

As I understand you the job runs just fine as long as you don't select "parallel processing" as the job then executes everything sequentially in a single session.

 

That the job hangs at number 9999 makes me think that you're eventually hitting some threshold limitting the number of child processes you can create. http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm 

Given the settings you describe there should only be 8 active child processes at a time but not sure if there can also be Linux settings which limit how many child processes a parent process can create in total or in a given amount of time.

 

Do I understand correctly that you've got a source table with 10000 rows and you're creating an XML file per row with each execution of your inner loop job only reading a single row from the source table and creating a single file?

If that's the case then consider a re-design of your job. Each call of the inner job spawns a child process which invokes a brand new SAS session (and that takes time and consumes resources). Consider instead to split up your source table into 8 chunks.

You could for example create a control table for your loop job where you pass in the start and end row to be processed by the inner job (and you then use these values in the inner job for your obs= and firstobs= values). Each call of the 8 inner job processes then an 8th of the rows of the source table - but all together sequentially. You still can run in parallel of course but this way you're only spawning 8 child processes in total and you don't spend the time to read your source table 10000times and to invoke SAS 100000 times.

 

Solution
‎05-03-2017 04:27 PM
SAS Employee
Posts: 11

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

You may be running into the following. If you select the “Execute iterations in parallel” option, the Loop transformation will generate a log file and an output (.lst) file for each iteration in the location specified for the “Location on host for log and output files” option. These files are named with a default 4-character prefix followed by the iteration number. The prefix can be changed on the Options tab. On that tab it says that the name cannot be longer than 8 characters so a maximum of 9,999 iterations can be achieved with the default prefix. A shorter prefix will allow more iterations.

 

Regards,

Robert

Respected Advisor
Posts: 4,173

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

Posted in reply to RobertLigtenberg

@RobertLigtenberg

Oh, that's where the 9999 comes from.

 

@kparikh

I'd still recommend that you re-design your DIS job so that you don't invoke 10000 SAS sessions. Even if SAS invocation takes only 1 second and even if you run 8 sessions in parallel, you'd still spend more than 20 minutes elapsed only for creation of SAS sessions: 10000 seconds / (8*60) = 20.8 minutes.

New Contributor
Posts: 3

Re: Executing a Parallel Process in DI Studio stops at 9999 records in Linux Environment

Posted in reply to RobertLigtenberg
Thank you very much!!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 253 views
  • 1 like
  • 4 in conversation