Friday, July 2, 2010

Pulling Data from DB2 into SQL Server using SSIS

Well, I am finally back! I am at a new location, in a new industry, and working with new data! Very exciting times!

One of the new things I am working with is pulling data into SSIS from a DB2 database running on an AS/400. The first thing I did was scour the net for information on which provider I should use to pull data from DB2 to SQL Server. I found 4 options:

1. Use a flat file extract from the DB2
2. Use the ODBC provider.
3. Use the iSeries OLEDB provider.
4. Use the Microsoft OLEDB provider.

After some initial testing between the ODBC provider and the iSeries OLEDB, I ruled out the ODBC. It is much too slow. I ran several small tests, but once I got up to 100k records, it was too painful to even complete the ODBC portion of the tests.

I then turned my attention to comparing the Microsoft OLEDB v3.0 provider to the iSeries OLEDB V5R3M0 provider.

I setup an SSIS package with 2 sequence containers: one that contained the flow for the MSOLEDB and one for the iSOLEDB. I setup several SQL SELECT statements from the tables that I will be using in my ETL. I used no WHERE clause, just a basic select. I wanted this test to be as simple as possible.
I then put these 2 containers into a for..next container and set it up to loop 10 times.

Note: I am running on a development server with limited memory (4 gigs, running a SQL server instance and SSIS) and 4 processors.

I ran it on the first SQL statement that would return about 4k rows. The MSOLEDB ran in 298ms, the iSOLEDB in 732ms. I changed the SQL statement to return 400k rows. The MSOLEDB took 83 seconds (1min, 23sec), and the iSOLEDB took 514 seconds (8min, 34sec). I then went for a big test, 1 million rows. The MSOLEDB took 636 seconds (10min, 36sec) and the iSOLEDB took 2008 seconds (33min, 28sec).

So as you can see from these tests, the Microsoft OLEDB for DB2 provider is much faster. However, how would it perform on a multimillion row rowset?

I tested the Microsoft OLEDB provider for DB2 against a table that contained over 12 million rows, with each row being ~500 bytes wide. The flat file import took 15 min, 56sec. The MSOLEDB took 20min, 53sec. However, if I use the OLEDB provider, I do not have to worry about the file not being created, etc. There are less moving parts. So for this example, I am going to use the OLEDB provider. I have several tables that have more than 40 million records. I am going to have to test these out and then decide how much it is worth to not have to rely on external file creation processes.

So, in summary, the new Microsoft OLEDB Provider for DB2 is very speedy. It is a no brainer to use it instead of the iSeries or the ODBC providers. For larger tables however, flat files may still be the way to go.

peace

2 comments:

Unknown said...

HI,
Great article.
Can you tell me how to configure the IOLEDB and the Micorosoft OLEDB? I tried using the Microsoft OLEDB and for some reason it needs a creation of a package on DB2. The db2 dbas dont want to create any packages. Can you help me?

Bob Pearson said...

For the MS OLEDB for DB2 v3.0, there was no package creation needed. My connection string from SSIS looks like this:

Data Source=ServerName;User ID=User ID;Password=Password;Initial Catalog=ServerName;Provider=DB2OLEDB;Persist Security Info=True;Mode=Read;DateTime As Char=False;DateTime As Date=True;Network Address=ServerName;Package Collection=SchemaName;Default Schema=SchemaName;DBMS Platform=DB2/AS400;Read Only=True;Defer Prepare=True;Packet Size=32767;

I am using DB2 on iSeries, v5r4.