Hi all,
I have created one process job, which contains 15 data job so it took about 10 hours in running a job with 1.6 million data, Is it possible to run the data jobs parallel so that it can reduce the time a bit ?
Thanks and Regards
Joydip Ghosh
Feauture wise, yes. How is applikation dependent.
To be able to help yu on how, please elaborate on how your job flow is built. How many separate data streams do you have, data/job dependencies, what HW reources are available to you?
Hi Linus,
The job has 15 data flows all are dependent on the 1st data flow node, which creates one value, but the other 14 are independent of all others. Now it is working like after completion of 1st node, 2nd node starts, then after completion of 2nd the 3rd starts. if we can do it parallel then I think it will be fast. How can we do that, should we require, Fork or Parallel iteration node transforms.
Thanks and Regards
Joydip
If I understand you correctly, that there are 14 different jobs, then I would build job dependencies using the Schedule Manager plugin in Management Console.
If there are one job but with different set of input data, I would solve it using Loop-transformation in DI Studio. Parallelism is managed by setting parameters accordingly.
Joydip,
You can run the first data job then link it to a Fork Node which would contain all of the other 14 data jobs. The Fork node will allow the 14 jobs to run in parallel once the first node completes.
See attached screenshot.
-shawn
Hi Linus, Shawn,
Thanks for the update using Fork, but we are still facing a lot of Issue, sometimes
a) out of memory.
Is there any way to control the memory to be allocated.
Secondly all our data jobs are writing the exceptions into a same table so the connection is refused by the server as two threads are trying to update the same table at same point of time and the job fails, is there any alternative way.
All suggestions are welcome.
Thanks and Regards
Joydip
Joydip,
Specifically which node is failing due to memory issues? There are node specific memory tweaks you can make. Also, how much physical memory is available on the computer that is running this job? What type of table are you writing the exceptions to? It sounds like it may be a table or a database that does not allow simultaneous connections.
-shawn
Hi Shawn,
Thanks for your reply, My system have 2GB Ram, and the database is an sql server database, and we are writing the exceptions to the table itself we are using macros to connect to the database and then write to the exception table. We are using an expression where we are opening and closing the connection and write the details with the help of that we are writing the exceptions records to the database. and this expression node is failing due to the memory issue.
Thanks and Regards.
Joydip
Joydip,
SQL Server uses pessimistic concurrency by default: This is stated in the offical MS documention. Have you thought about using the Data Target (Insert) node and passing the macros into it? Opening and Closing the database through an expression may be another reason the job is failing. You can control commit intervals in the Data Target (Insert) node which could help performance as well.
Pessimistic Concurrency
Default behavior: acquire locks to block access to data that another process is using.
Optimistic Concurrency
Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying.
Hope this helps,
-shawn
So, you work in Data Flux, that information could have been useful... 😉
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.