Anexinet BI Blog: Have your SSIS Fast Load (Bulk Load) and Row by Row Error Messages too

In the SSIS OLEDB Destination component there are several Data Access options to select from. For this post we will only be discussing “Table or view” and “Table or view – fast load”.

The difference between these 2 options is that effectively “Table or view” will execute one SQL command for each and every row that you are loading into your destination, while “Table or view – fast load” effectively uses a BULK INSERT command to insert the data from the pipeline. For an under-the-hood look at the differences, please see Jamie Thomson’s post here - http://consultingblogs.emc.com/jamiethomson/archive/2006/08/14/SSIS_3A00_-Destination-Adapter-Comparison.aspx

When you choose the fast load option, you are also given a few other options to set. Rows per batch and Maximum insert commit size have always confused me because they sound similar, so I dug around a bit for the definitions.

Since fast load uses a BULK INSERT, Rows Per Batch and Maximum Insert Commit Size are actually arguments for the BULK INSERT operation. Maximum Insert Commit Size actually equates to the BATCHSIZE argument and Rows Per Batch equals… ROWS_PER_BATCH. See this post on MSDN for a bit more.

Knowing that, we can define the OLEDB Destination fast load options as such from BOL:

Rows Per Batch (ROWS_PER_BATCH)

“Indicates the approximate number of rows of data in the data file.

By default, all the data in the data file is sent to the server as a single transaction, and the number of rows in the batch is unknown to the query optimizer. If you specify ROWS_PER_BATCH (with a value > 0) the server uses this value to optimize the bulk-import operation. The value specified for ROWS_PER_BATCH should approximately the same as the actual number of rows.”

Maximum Insert Commit Size (BATCHSIZE)

“Specifies the number of rows in a batch. Each batch is copied to the server as one transaction. If this fails, SQL Server commits or rolls back the transaction for every batch. By default, all data in the specified data file is one batch”

So by changing the Maximum Insert Commit Size, we can change the number of rows that will be committed at one time. SSIS will continue to BULK INSERT batches of this size until all of the rows in the pipeline have been inserted (or failed). And, if I understand the BOL entry correctly, Rows Per Batch is actually just an optimizer hint to get maximum performance out of the BULK INSERT operation. That is a topic for another discussion!

Why would you ever NOT use fast load?

With the fast load option, if ANY row in a transaction fails (constraints, PK violations etc…) that entire transaction will fail to commit. This means that all of the rows that were part of that BULK INSERT operation – which the number of rows is set by the Maximum Insert Commit Size – will fail and/or be passed to the OLEDB Destination Error Row path.

Unfortunately, this means that if you leave the default value for Maximum Insert Commit Size (2147483647) and try to insert 1 million records, and 1 record fails a constraint, all 1 million records would be sent to the error output. And even worse, you can’t get an Error Description from the fast load/BULK INSERT operation. So you would get 1 million records into your error pipeline, not know what the error was, and not know which row caused the error.

Getting the benefits of Fast Load, but still capture error rows and descriptions

What you can do to get the benefits of a fast load, but still get the detailed error information of the non-fast load row by row insert is chain 2 OLEDB Destinations together and change the Maximum Insert Commit Size.

In the screen shot above, we are using the first OLEDB Destination with the fast load option into our FactClaim fact table and we have the Maximum Insert Commit Size set to 10,000 rows. If 1 row in this batch fails, those 10,000 rows will go to the row by row insert – but if none of those 10,000 rows have errors, they will be committed using BULK INSERT via the fast load. So if we were trying to load 1 million records, we would of course end up with 100 BULK INSERT transactions (assuming no errors), but depending on your required design this may be more acceptable than having the entire 1 million records fail or being forced to use non-fast load.

A side benefit of changing the Maximum Insert Commit Size is that it can help to manage tempdb and log space because smaller batches are committed at one time, and thus the whole 1 million rows would not need to be rolled back if the last row fails.

Getting the exact error row and error description

Once you send the smaller set of rows from the failed batch to the second OLEDB Destination with non-fast load, SSIS will try to insert those rows one at a time. In this way, only the exact rows that fail will now be sent to this second OLEDB Destinations error output. Thus only those rows that actually fail the insert are not inserted.

We can then use a script task to add an ErrorDescription column, and capture the actual error description with code similar to the below.

public override void Input0_ProcessInputRow(Input0Buffer Row)

Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.OLEDBDestination1ErrorCode);

Finally we can get a row count of the actual errors, and store those error rows, with error description, to a flat file or a table.

By using this method, we can get most of the benefit of using the fast load option (fast loading data with 0 errors in one BULK INSERT would likely be faster), but still be able to capture the exact error row and description.