Friday, July 23, 2010

Using Maximum Insert Commit Size with FAST LOAD on the OLE DB Connector, and a Warning

I have been working on some new ETL recently, which of course has me optimizing the ETL. One of the things that I look at while optimizing is the Maximum Insert Commit Size (MICS) one the OLEDB connector when using the Table – FAST LOAD option.

This setting controls how many records are sent and committed to a table at a time. For example, if you have a 12,000 record source, and you set MICS to 5,000, you will have 3 bulk insert statements sent to the server: two with 5,000 records and 1 with 2,000 records.

The beauty of doing this is that it helps keep the transaction log small if you use simple logging. (On staging databases, I routinely use simple logging since there is no need to have full logging). It can also increase performance since the server is dealing with smaller chunks of data at one time.

But, like all good things, there is something you should watch out for. If you set up redirect row on error in the OLEDB Destination and one record in a batch fails, that one record will be redirected to the error output, but the entire batch will not be inserted into the database! The only indication you will get is that one record failure.

I have tried to set the MICS to 1 for safety, but that ends up being slower than the normal Table Load.
What I wound up doing is using the normal table load on smaller tables, and removing the redirect row error handling on larger tables. I then put in a derived column step that clean the data before inserting it. This allows me to load a 46 million row table in a few minutes, with a transaction log that doesn’t get above 10 Megs.

So, definitely work with the MICS setting to further optimize your ETL, but if you are using error handling to redirect rows, be aware that you will lose data each time a row is redirected.

2 comments:

Bob Amy said...

To resolve the loss of data when fast load fails, I redirect to a "slow" load. This redirects the whole batch of rows, then only redirects the error ones to your error place.

Bob Pearson said...

Thanks for the comment!
That is a good solution, same thing that I do. But I keep my commit number to about 10,000 so that if there is an error, the slow insert doesn't impact the ETL execution time.