BookmarkSubscribeRSS Feed
Joydip
Calcite | Level 5

Hi all,

I have created one process job, which contains 15 data job so it took about 10 hours in running a job with 1.6 million data, Is it possible to run the data jobs parallel so that it can reduce the time a bit ?

Thanks and Regards

Joydip Ghosh

10 REPLIES 10
LinusH
Tourmaline | Level 20

Feauture wise, yes. How is applikation dependent.

Data never sleeps
LinusH
Tourmaline | Level 20

To be able to help yu on how, please elaborate on how your job flow is built. How many separate data streams do you have, data/job dependencies, what HW reources are available to you?

Data never sleeps
Joydip
Calcite | Level 5

Hi Linus,

The job has 15 data flows all are dependent on the 1st data flow node, which creates one value, but the other 14 are independent of all others. Now it is working like after completion of 1st node, 2nd node starts, then after completion of 2nd the 3rd starts. if we can do it parallel then I think it will be fast. How can we do that, should we require, Fork or Parallel iteration node transforms.

Thanks and Regards

Joydip

LinusH
Tourmaline | Level 20

If I understand you correctly, that there are 14 different jobs, then I would build job dependencies using the Schedule Manager plugin in Management Console.

If there are one job but with different set of input data, I would solve it using Loop-transformation in DI Studio. Parallelism is managed by setting parameters accordingly.

Data never sleeps
skillman
SAS Employee

Joydip,

You can run the first data job then link it to a Fork Node which would contain all of the other 14 data jobs. The Fork node will allow the 14 jobs to run in parallel once the first node completes.

See attached screenshot.

-shawn


fork_node.jpg
Joydip
Calcite | Level 5

Hi Linus, Shawn,

Thanks for the update using Fork, but we are still facing a lot of Issue, sometimes

a) out of memory.

Is there any way to control the memory to be allocated.

Secondly all our data jobs are writing the exceptions into a same table so the connection is refused by the server as two threads are trying to update the same table at same point of time and the job fails, is there any alternative way.

All suggestions are welcome.

Thanks and Regards

Joydip

skillman
SAS Employee

Joydip,

Specifically which node is failing due to memory issues? There are node specific memory tweaks you can make. Also, how much physical memory is available on the computer that is running this job? What type of table are you writing the exceptions to? It sounds like it may be a table or a database that does not allow simultaneous connections.

-shawn

Joydip
Calcite | Level 5

Hi Shawn,

Thanks for your reply, My system have 2GB Ram, and the database is an sql server database, and we are writing the exceptions to the table itself we are using macros to connect to the database and then write to the exception table. We are using an expression where we are opening and closing the connection and write the details with the help of that we are writing the exceptions records to the database. and this expression node is failing due to the memory issue.

Thanks and Regards.

Joydip

skillman
SAS Employee

Joydip,

SQL Server uses pessimistic concurrency by default: This is stated in the offical MS documention. Have you thought about using the Data Target (Insert) node and passing the macros into it? Opening and Closing the database through an expression may be another reason the job is failing. You can control commit intervals in the Data Target (Insert) node which could help performance as well.

Pessimistic Concurrency

Default behavior: acquire locks to block access to data that another process is using.

Optimistic Concurrency

Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying.

Hope this helps,

-shawn

LinusH
Tourmaline | Level 20

So, you work in Data Flux, that information could have been useful... 😉

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2522 views
  • 9 likes
  • 3 in conversation