SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Advanced Scenario: Commit Intervals

Accepted Solution Solved
Reply
SAS Employee
Posts: 25
Accepted Solution

Advanced Scenario: Commit Intervals

Occasionally questions about Data Management products will come up through channels outside of the Data management Community. Rather than simply answer the question in a vacuum and not publicize it, we thought we would give you, the community members a chance to post your thoughts on the answer. After a week, we will post the answer that was provided by our internal product experts so that you can see how the various responses compared.

This week we present a question about commit intervals. As a community member, what would be your approach?

“Does anyone having guidelines around what the commit intervals should be when using the Data Output Insert or Update nodes? Generally I’m a believer in the fewer commits the better (i.e. more rows per commit) but I could listen to other arguments. I also understand that it might depend on the application – a real-time query (so 1 row per commit) vs. a batch job (lots of rows per commit).”


Accepted Solutions
Solution
‎02-18-2014 01:51 PM
SAS Employee
Posts: 25

Re: Advanced Scenario: Commit Intervals

Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.

"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."

As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!

Anna-Marie

View solution in original post


All Replies
Super User
Posts: 5,256

Re: Advanced Scenario: Commit Intervals

Perhaps the answer is in your question, it depends...

If one could wish for something, it would be that the product  would help in choosing the right commit interval/size - it is quite a technical parameter. Maybe som kind selection from different scenarios, even an optional wizard for complicated situations...

A part from the thing you mention are functionality in the target database, ease of reload/rerun, risk for failure, and of course, performance requirements.

Data never sleeps
Respected Advisor
Posts: 3,887

Re: Advanced Scenario: Commit Intervals

Hi

"After a week, we will post the answer that was provided by our internal product experts so that you can see how the various responses compared."

Where can I find this answer?

Thanks

Patrick

PROC Star
Posts: 1,090

Re: Advanced Scenario: Commit Intervals

Late to the party...

I'm very skeptical that a "technical" decision could be made. There are too many subjective factors. The key things that I consider:

Online versus batch - probably THE most important;

Cost of redoing the work - relates to the previous, but could have other issues;

Importance of not losing the work - same idea;

Data volumes, size of records, number of associated tables;

Technical factors relating to the data management tool - what works best with Oracle doesn't necessarily work best with Teradata;

Nature of the processor complex that I'm running on - power of processors, speed of disk, etc.;

Length of time of the update window, in cases of batch.

I don't see how a software product could know enough about most of these factors to correctly implement a decision.

Looking forward to seeing the "correct" answer!

  Tom

Solution
‎02-18-2014 01:51 PM
SAS Employee
Posts: 25

Re: Advanced Scenario: Commit Intervals

Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.

"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."

As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!

Anna-Marie

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 601 views
  • 0 likes
  • 4 in conversation