Solved: Re: Advanced Scenario: Commit Intervals

anna_holland · Posted 01-17-2014 01:46 PM

Occasionally questions about Data Management products will come up through channels outside of the Data management Community. Rather than simply answer the question in a vacuum and not publicize it, we thought we would give you, the community members a chance to post your thoughts on the answer. After a week, we will post the answer that was provided by our internal product experts so that you can see how the various responses compared.

This week we present a question about commit intervals. As a community member, what would be your approach?

“Does anyone having guidelines around what the commit intervals should be when using the Data Output Insert or Update nodes? Generally I’m a believer in the fewer commits the better (i.e. more rows per commit) but I could listen to other arguments. I also understand that it might depend on the application – a real-time query (so 1 row per commit) vs. a batch job (lots of rows per commit).”

anna_holland · Posted 02-18-2014 01:51 PM

Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.

"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."

As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!

Anna-Marie

View solution in original post

LinusH · Posted 01-19-2014 11:06 AM

Perhaps the answer is in your question, it depends...

If one could wish for something, it would be that the product would help in choosing the right commit interval/size - it is quite a technical parameter. Maybe som kind selection from different scenarios, even an optional wizard for complicated situations...

A part from the thing you mention are functionality in the target database, ease of reload/rerun, risk for failure, and of course, performance requirements.

Data never sleeps

Patrick · Posted 02-14-2014 10:59 PM

Hi

"After a week, we will post the answer that was provided by our internal product experts so that you can see how the various responses compared."

Where can I find this answer?

Thanks

Patrick

TomKari · Posted 02-15-2014 11:23 AM

Late to the party...

I'm very skeptical that a "technical" decision could be made. There are too many subjective factors. The key things that I consider:

Online versus batch - probably THE most important;

Cost of redoing the work - relates to the previous, but could have other issues;

Importance of not losing the work - same idea;

Data volumes, size of records, number of associated tables;

Technical factors relating to the data management tool - what works best with Oracle doesn't necessarily work best with Teradata;

Nature of the processor complex that I'm running on - power of processors, speed of disk, etc.;

Length of time of the update window, in cases of batch.

I don't see how a software product could know enough about most of these factors to correctly implement a decision.

Looking forward to seeing the "correct" answer!

Tom

anna_holland · Posted 02-18-2014 01:51 PM

Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.

"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."

As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!

Anna-Marie

Advanced Scenario: Commit Intervals

Re: Advanced Scenario: Commit Intervals

Re: Advanced Scenario: Commit Intervals

Re: Advanced Scenario: Commit Intervals

Re: Advanced Scenario: Commit Intervals

Re: Advanced Scenario: Commit Intervals

Registration is open