Friday, July 23, 2010

Using Maximum Insert Commit Size with FAST LOAD on the OLE DB Connector, and a Warning

I have been working on some new ETL recently, which of course has me optimizing the ETL. One of the things that I look at while optimizing is the Maximum Insert Commit Size (MICS) one the OLEDB connector when using the Table – FAST LOAD option.

This setting controls how many records are sent and committed to a table at a time. For example, if you have a 12,000 record source, and you set MICS to 5,000, you will have 3 bulk insert statements sent to the server: two with 5,000 records and 1 with 2,000 records.

The beauty of doing this is that it helps keep the transaction log small if you use simple logging. (On staging databases, I routinely use simple logging since there is no need to have full logging). It can also increase performance since the server is dealing with smaller chunks of data at one time.

But, like all good things, there is something you should watch out for. If you set up redirect row on error in the OLEDB Destination and one record in a batch fails, that one record will be redirected to the error output, but the entire batch will not be inserted into the database! The only indication you will get is that one record failure.

I have tried to set the MICS to 1 for safety, but that ends up being slower than the normal Table Load.
What I wound up doing is using the normal table load on smaller tables, and removing the redirect row error handling on larger tables. I then put in a derived column step that clean the data before inserting it. This allows me to load a 46 million row table in a few minutes, with a transaction log that doesn’t get above 10 Megs.

So, definitely work with the MICS setting to further optimize your ETL, but if you are using error handling to redirect rows, be aware that you will lose data each time a row is redirected.

Wednesday, July 14, 2010

Setting a NULL value in SSIS derived column

Ran into an interesting error while trying to set a null value to a column in a derived column task in SSIS.
Here is my statement:
[CustNum]=="" ? NULL(DT_STR, 4, 1252) : [CustNum]

And the error thrown:
For operands of the conditional operator, the data type DT_STR is supported only for input and cast operations.

The correct syntax for this is:

[CustNum]=="" ? (DT_STR, 4, 1252)NULL(DT_STR, 4, 1252) : [CustNum]

Not sure why this requires the explicit cast since you include the type in the null statement.




My Z80 Homebrew Computer - The Pony80





Thursday, July 8, 2010

Deploying SSIS Packages with a Batch File

This is a handy little batch file that I have been using for years to deploy multiple SSIS packages to a SQL Server. It is pretty basic; it copies all of the *.dtsx files in a specified folder to the SQL Server.
It first removes the packages from the SQL Server, deletes the folder, and then copies the new files to the server. One limitation is that it only deletes the packages that are in the source folder. So if you have extra packages deployed, you will need to remove them manually.

This batch files uses DTUTIL so it must be run on a machine that has SSIS installed.

The syntax for running the file is: DeploySSISPackages.cmd (SQL Server name) (Folder containing packages to deploy) (Folder in MSDB to deploy to)

If any of the 3 parameters contain spaces, enclose them in quotes.


DeploySSISPackages.cmd SQLServer001 “C:\My Projects\SSIS\ETL” “ETL SSIS”

DeploySSISPackages.cmd SQLServer001 C:\Projects\SSIS\ETL ETL


@Echo Off

Echo SSIS Package Installation Script
Echo Written by Bob Pearson
Echo 7/6/07

if %1a == a goto Error
if %2a == a goto Error
if %3a == a goto Error

Echo Deployment Server: %1
Echo -----------------------------------------------------
Echo --This will delete any %3 data mart files
Echo --on the server, and reinstall from the local machine
Echo -----------------------------------------------------
REM Goto Out

REM Remove Existing files and directory on Server
for %%f in (%2"\*.dtsx") do (
Echo Now Removing: %%~nf
dtutil /Q /SourceS %1 /SQL "\%3\\%%~nf" /Del

dtutil /Q /SourceS %1 /FDe "SQL;\;%3"


Echo Preparing to create folder

REM Create the Directory
dtutil /Q /SourceS %1 /FC "SQL;\;%3"
if errorlevel 1 goto End
Echo Preparing to Copy Files to Server

REM copy the SSIS Packages to the server
for %%f in (%2"\*.dtsx") do (
Echo Now Copying: %%~nf
dtutil /Q /DestS %1 /Fi "%%f" /C "SQL;\%3\\%%~nf"

Echo Installation Complete!
Goto End

Echo Missing Servername!
Echo Syntax: Deploy SSIS Packages [servername] [Source File Path] [MSDB Deploy Folder]



Friday, July 2, 2010

Pulling Data from DB2 into SQL Server using SSIS

Well, I am finally back! I am at a new location, in a new industry, and working with new data! Very exciting times!

One of the new things I am working with is pulling data into SSIS from a DB2 database running on an AS/400. The first thing I did was scour the net for information on which provider I should use to pull data from DB2 to SQL Server. I found 4 options:

1. Use a flat file extract from the DB2
2. Use the ODBC provider.
3. Use the iSeries OLEDB provider.
4. Use the Microsoft OLEDB provider.

After some initial testing between the ODBC provider and the iSeries OLEDB, I ruled out the ODBC. It is much too slow. I ran several small tests, but once I got up to 100k records, it was too painful to even complete the ODBC portion of the tests.

I then turned my attention to comparing the Microsoft OLEDB v3.0 provider to the iSeries OLEDB V5R3M0 provider.

I setup an SSIS package with 2 sequence containers: one that contained the flow for the MSOLEDB and one for the iSOLEDB. I setup several SQL SELECT statements from the tables that I will be using in my ETL. I used no WHERE clause, just a basic select. I wanted this test to be as simple as possible.
I then put these 2 containers into a container and set it up to loop 10 times.

Note: I am running on a development server with limited memory (4 gigs, running a SQL server instance and SSIS) and 4 processors.

I ran it on the first SQL statement that would return about 4k rows. The MSOLEDB ran in 298ms, the iSOLEDB in 732ms. I changed the SQL statement to return 400k rows. The MSOLEDB took 83 seconds (1min, 23sec), and the iSOLEDB took 514 seconds (8min, 34sec). I then went for a big test, 1 million rows. The MSOLEDB took 636 seconds (10min, 36sec) and the iSOLEDB took 2008 seconds (33min, 28sec).

So as you can see from these tests, the Microsoft OLEDB for DB2 provider is much faster. However, how would it perform on a multimillion row rowset?

I tested the Microsoft OLEDB provider for DB2 against a table that contained over 12 million rows, with each row being ~500 bytes wide. The flat file import took 15 min, 56sec. The MSOLEDB took 20min, 53sec. However, if I use the OLEDB provider, I do not have to worry about the file not being created, etc. There are less moving parts. So for this example, I am going to use the OLEDB provider. I have several tables that have more than 40 million records. I am going to have to test these out and then decide how much it is worth to not have to rely on external file creation processes.

So, in summary, the new Microsoft OLEDB Provider for DB2 is very speedy. It is a no brainer to use it instead of the iSeries or the ODBC providers. For larger tables however, flat files may still be the way to go.