Occasionally questions about Data Management products will come up through channels outside of the Data management Community. Rather than simply answer the question in a vacuum and not publicize it, we thought we would give you, the community members a chance to post your thoughts on the answer. After a week, we will post the answer that was provided by our internal product experts so that you can see how the various responses compared.
This week we present a question about commit intervals. As a community member, what would be your approach?
“Does anyone having guidelines around what the commit intervals should be when using the Data Output Insert or Update nodes? Generally I’m a believer in the fewer commits the better (i.e. more rows per commit) but I could listen to other arguments. I also understand that it might depend on the application – a real-time query (so 1 row per commit) vs. a batch job (lots of rows per commit).”
Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.
"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."
As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!
Anna-Marie
Perhaps the answer is in your question, it depends...
If one could wish for something, it would be that the product would help in choosing the right commit interval/size - it is quite a technical parameter. Maybe som kind selection from different scenarios, even an optional wizard for complicated situations...
A part from the thing you mention are functionality in the target database, ease of reload/rerun, risk for failure, and of course, performance requirements.
Late to the party...
I'm very skeptical that a "technical" decision could be made. There are too many subjective factors. The key things that I consider:
Online versus batch - probably THE most important;
Cost of redoing the work - relates to the previous, but could have other issues;
Importance of not losing the work - same idea;
Data volumes, size of records, number of associated tables;
Technical factors relating to the data management tool - what works best with Oracle doesn't necessarily work best with Teradata;
Nature of the processor complex that I'm running on - power of processors, speed of disk, etc.;
Length of time of the update window, in cases of batch.
I don't see how a software product could know enough about most of these factors to correctly implement a decision.
Looking forward to seeing the "correct" answer!
Tom
Hi Patrick, I am sorry for the delay. Our internal product experts took a different approach to answering this use case scenario question.
"Since every commit requires opening and closing the connection to the data, you are correct that you should do them no more often than necessary. But you also don’t want them building up too much – it can cause timing issues or memory usage problems. For large (multi-million record) table writes, a commit every 1000 records or so doesn’t have a significant impact on write time."
As a community member, would you take this approach? Or would you approach this scenario with a different solution? I look forward to seeing your responses!
Anna-Marie
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.