I'm trying to understand the relationship between the Bulk_Row_Count, Commit Intervals, and the Max_Error_Count in the advanced properties of Data Management Studio, especially in the case of an Oracle database. For example, if I set the Bulk_Row_Count at 1,000, should the commit interval be set at 1,000 as well? In terms of performance should there be a scenario where these two values are different?
Also, I need to load 1.5+ million records on regular basis, and was considering using the Bulk_Row_Count option. Each one of these records is important , and needs to be accounted for. In this scenario would I set the Max_Error_Count to 0 (which I assume fails the job after it encounters the first error) along with setting the Bulk_Row_Count, or does setting Max_Error_Count to 0 nullify the performance gains from the Bulk_Row_Count? I could also use Status_Field property to determine if a inserted record failed, but would this be advised when using the Bulk_Row_Count?
Hi - here are some details about these options:
BULK_ROW_COUNT - This allows you to process a set number of rows at a time. Enter the number of rows to be processed in the Property Value field when you have thousands of records. This option will help your system run more efficiently. The default value is 10,000. Bulk row count is supported on databases such as Oracle SQL Server and DB2. There is no need to activate a bulkload option at the driver level in the data connection for the table.
MAX_ERROR_COUNT - This allows you to set a threshold for maximum errors. If set to -1, infinite errors are allowed. If you want to ensure that no data rows are left out of the commit to the database, you should set this to 0.
COMMIT_EVERY - This sets how often a commit command should be sent to the data source.
In some cases where many rows of data need to be loaded or updated into a database, you can use a bulk loading feature built into the “Data Target (Insert)” node to drastically improve performance.
For example, you can set the maximum number of rows to bulk load to “50000” and set the commit interval to “1.” This will bulk load batches of 50000 rows of data into the database very quickly because the frequent communication with the database server as would happen with smaller packets of data can be avoided.
Experiment with the commit level setting. As a general rule, committing every row to the database will be slower than committing a group of rows except in this special case: use the bulk count and commit level together (setting commit level to 1 and the bulk number to 50000 for example) to achieve superior load/update times.
Ron
Hi - here are some details about these options:
BULK_ROW_COUNT - This allows you to process a set number of rows at a time. Enter the number of rows to be processed in the Property Value field when you have thousands of records. This option will help your system run more efficiently. The default value is 10,000. Bulk row count is supported on databases such as Oracle SQL Server and DB2. There is no need to activate a bulkload option at the driver level in the data connection for the table.
MAX_ERROR_COUNT - This allows you to set a threshold for maximum errors. If set to -1, infinite errors are allowed. If you want to ensure that no data rows are left out of the commit to the database, you should set this to 0.
COMMIT_EVERY - This sets how often a commit command should be sent to the data source.
In some cases where many rows of data need to be loaded or updated into a database, you can use a bulk loading feature built into the “Data Target (Insert)” node to drastically improve performance.
For example, you can set the maximum number of rows to bulk load to “50000” and set the commit interval to “1.” This will bulk load batches of 50000 rows of data into the database very quickly because the frequent communication with the database server as would happen with smaller packets of data can be avoided.
Experiment with the commit level setting. As a general rule, committing every row to the database will be slower than committing a group of rows except in this special case: use the bulk count and commit level together (setting commit level to 1 and the bulk number to 50000 for example) to achieve superior load/update times.
Ron
Thanks Ron. That was very helpful. I'll experiment with the Commit levels to find the optimal update time.
Kevin
For me in version 2.6 I have following experiences, where the client is Windows and the server Linux:
When you have a lot of columns, you have to decrease the setting for bulk_row_count to 25000 or even less.
This works for both Insert and Update.
You should not enable bulkload in the driver and WireProtocolMode should be 2.
You have to check the amount of records after an insert, because in some cases the job went well, but not all records were inserted!
Test it on both client and server, for me this are different OS
Hi Rob,
Thanks for the info. Do you know if having a primary key on the table that you try to bulk load to affects performance?
Kevin
With the insert, there was no primary key or other indexes. So i don't know the impact of it and I did not test it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.