SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

I'm trying to understand the relationship between the Bulk_Row_Count, Commit Intervals, and the Max_Error_Count in the advanced properties of Data Management Studio, especially in the case of an Oracle database. For example, if I set the Bulk_Row_Count at 1,000, should the commit interval be set at 1,000 as well? In terms of performance should there be a scenario where these two values are different?

 

Also, I need to load 1.5+ million records on regular basis, and was considering using the Bulk_Row_Count option. Each one of these records is important , and needs to be accounted for. In this scenario would I set the Max_Error_Count to 0 (which I assume fails the job after it encounters the first error) along with setting the Bulk_Row_Count, or does setting Max_Error_Count to 0 nullify the performance gains from the Bulk_Row_Count? I could also use Status_Field property to determine if a inserted record failed, but would this be advised when using the Bulk_Row_Count?


Accepted Solutions
Solution
‎10-24-2016 08:46 AM
SAS Super FREQ
Posts: 90

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

Hi - here are some details about these options:

 

BULK_ROW_COUNT - This allows you to process a set number of rows at a time. Enter the number of rows to be processed in the Property Value field when you have thousands of records. This option will help your system run more efficiently. The default value is 10,000. Bulk row count is supported on databases such as Oracle SQL Server and DB2. There is no need to activate a bulkload option at the driver level in the data connection for the table.

 

MAX_ERROR_COUNT - This allows you to set a threshold for maximum errors. If set to -1, infinite errors are allowed. If you want to ensure that no data rows are left out of the commit to the database, you should set this to 0.

 

COMMIT_EVERY - This sets how often a commit command should be sent to the data source.

 

In some cases where many rows of data need to be loaded or updated into a database, you can use a bulk loading feature built into the “Data Target (Insert)” node to drastically improve performance.


For example, you can set the maximum number of rows to bulk load to “50000” and set the commit interval to “1.” This will bulk load batches of 50000 rows of data into the database very quickly because the frequent communication with the database server as would happen with smaller packets of data can be avoided.

 

Experiment with the commit level setting. As a general rule, committing every row to the database will be slower than committing a group of rows except in this special case: use the bulk count and commit level together (setting commit level to 1 and the bulk number to 50000 for example) to achieve superior load/update times.

 

Ron

View solution in original post


All Replies
Solution
‎10-24-2016 08:46 AM
SAS Super FREQ
Posts: 90

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

Hi - here are some details about these options:

 

BULK_ROW_COUNT - This allows you to process a set number of rows at a time. Enter the number of rows to be processed in the Property Value field when you have thousands of records. This option will help your system run more efficiently. The default value is 10,000. Bulk row count is supported on databases such as Oracle SQL Server and DB2. There is no need to activate a bulkload option at the driver level in the data connection for the table.

 

MAX_ERROR_COUNT - This allows you to set a threshold for maximum errors. If set to -1, infinite errors are allowed. If you want to ensure that no data rows are left out of the commit to the database, you should set this to 0.

 

COMMIT_EVERY - This sets how often a commit command should be sent to the data source.

 

In some cases where many rows of data need to be loaded or updated into a database, you can use a bulk loading feature built into the “Data Target (Insert)” node to drastically improve performance.


For example, you can set the maximum number of rows to bulk load to “50000” and set the commit interval to “1.” This will bulk load batches of 50000 rows of data into the database very quickly because the frequent communication with the database server as would happen with smaller packets of data can be avoided.

 

Experiment with the commit level setting. As a general rule, committing every row to the database will be slower than committing a group of rows except in this special case: use the bulk count and commit level together (setting commit level to 1 and the bulk number to 50000 for example) to achieve superior load/update times.

 

Ron

Occasional Contributor
Posts: 14

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

Thanks Ron. That was very helpful. I'll experiment with the Commit levels to find the optimal update time.

 

Kevin

New Contributor
Posts: 2

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

For me in version 2.6 I have following experiences, where the client is Windows and the server Linux:

When you have a lot of columns, you have to decrease the setting for bulk_row_count to 25000 or even less.

This works for both Insert and Update.

You should not enable bulkload in the driver and WireProtocolMode should be 2.

You have to check the amount of records after an insert, because in some cases the job went well, but not all records were inserted!

Test it on both client and server, for me this are different OS

Occasional Contributor
Posts: 14

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

Hi Rob,

 

Thanks for the info. Do you know if having a primary key on the table that you try to bulk load to affects performance?

 

Kevin

New Contributor
Posts: 2

Re: Bulk_Row_Count, Commit Intervals, and Max_Error_Count in Data Management Studio 2.7

With the insert, there was no primary key or other indexes. So i don't know the impact of it and I did not test it.

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 579 views
  • 1 like
  • 3 in conversation