Thursday, September 23, 2010

SSIS 2008 / SQL Server 2008 Connectivity Issue

We recently upgraded to SSIS 2008 and SQL Server 2008 r2. This helped the performance of a large package that contained 27 lookups for a fact table. It used an OLEDB destination component, set to Table – Fast Load. It would process a million rows in less than 2 minutes.

This worked well, until I added a multicast and an OLE Command component to the data flow. What I wanted was to check the values passed to the stored procedure and add the existing row to history if there was a change. The OLE Command used the same connection manager as the main data insert, since the history and the current tables were in the same database.

When running it, I would get to about 100,000 rows, and then SSIS would begin to slow down and finally it would lose network connectivity to the SQL Server. (We have a separate SSIS server) This is the error that was written to the logs:

SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred.
Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0"
Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0"
Hresult: 0x80004005 Description: "Communication link failure".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0"
Hresult: 0x80004005 Description: "Named Pipes Provider: No process is on the other end of the pipe."

I began by troubleshooting the network, looking for errors on the switch and both servers. This yielded nothing, there were no errors logged. So I turned to SSIS, figuring that it somehow was the culprit since everything worked until I added the multicast.

I removed the multicast and the OLE Command components and tested again. The package processed a million rows in less than 2 minutes.

I added the multicast back in, without the OLE Command and tested. Again it processed a million rows in less than 2 minutes.

I added the OLE Command component back in and tested. It failed at 49,000 records.

I added another connection manager, set the same properties and then pointed the OLE Command to the new manager. When I tested, I was surprised to see it complete successfully!

It seems that the one connection manager was not able to handle the multiple destinations simultaneously. I have no way to confirm this at this time, but since I have multiple connection managers for the same database, I have not experienced this issue at all.

Wednesday, August 4, 2010

SSRS Fix for FireFox/Chrome Report Size

If you have ever used Chrome or FireFox with SSRS, you know the report window can show up very small. This is due to the way that SSRS implements the IFRAME.

A quick fix for this is to add the following entry to the ReportingServices.css file.
.DocMapAndReportFrame
{
min-height: 800px;
min-width: 800px;
}
You can usually find the ReportingServices.css file at: C:\Program Files\Microsoft SQL Server\MSSQL.3\Reporting Services\ReportManager\Styles, however your location may be different depending on your installation.

peace

Friday, July 23, 2010

Using Maximum Insert Commit Size with FAST LOAD on the OLE DB Connector, and a Warning

I have been working on some new ETL recently, which of course has me optimizing the ETL. One of the things that I look at while optimizing is the Maximum Insert Commit Size (MICS) one the OLEDB connector when using the Table – FAST LOAD option.

This setting controls how many records are sent and committed to a table at a time. For example, if you have a 12,000 record source, and you set MICS to 5,000, you will have 3 bulk insert statements sent to the server: two with 5,000 records and 1 with 2,000 records.

The beauty of doing this is that it helps keep the transaction log small if you use simple logging. (On staging databases, I routinely use simple logging since there is no need to have full logging). It can also increase performance since the server is dealing with smaller chunks of data at one time.

But, like all good things, there is something you should watch out for. If you set up redirect row on error in the OLEDB Destination and one record in a batch fails, that one record will be redirected to the error output, but the entire batch will not be inserted into the database! The only indication you will get is that one record failure.

I have tried to set the MICS to 1 for safety, but that ends up being slower than the normal Table Load.
What I wound up doing is using the normal table load on smaller tables, and removing the redirect row error handling on larger tables. I then put in a derived column step that clean the data before inserting it. This allows me to load a 46 million row table in a few minutes, with a transaction log that doesn’t get above 10 Megs.

So, definitely work with the MICS setting to further optimize your ETL, but if you are using error handling to redirect rows, be aware that you will lose data each time a row is redirected.

Wednesday, July 14, 2010

Setting a NULL value in SSIS derived column

Ran into an interesting error while trying to set a null value to a column in a derived column task in SSIS.
Here is my statement:
[CustNum]=="" ? NULL(DT_STR, 4, 1252) : [CustNum]

And the error thrown:
For operands of the conditional operator, the data type DT_STR is supported only for input and cast operations.

The correct syntax for this is:

[CustNum]=="" ? (DT_STR, 4, 1252)NULL(DT_STR, 4, 1252) : [CustNum]

Not sure why this requires the explicit cast since you include the type in the null statement.

peace

Thursday, July 8, 2010

Deploying SSIS Packages with a Batch File

This is a handy little batch file that I have been using for years to deploy multiple SSIS packages to a SQL Server. It is pretty basic; it copies all of the *.dtsx files in a specified folder to the SQL Server.
It first removes the packages from the SQL Server, deletes the folder, and then copies the new files to the server. One limitation is that it only deletes the packages that are in the source folder. So if you have extra packages deployed, you will need to remove them manually.

This batch files uses DTUTIL so it must be run on a machine that has SSIS installed.

The syntax for running the file is: DeploySSISPackages.cmd (SQL Server name) (Folder containing packages to deploy) (Folder in MSDB to deploy to)

If any of the 3 parameters contain spaces, enclose them in quotes.

Examples:

DeploySSISPackages.cmd SQLServer001 “C:\My Projects\SSIS\ETL” “ETL SSIS”

DeploySSISPackages.cmd SQLServer001 C:\Projects\SSIS\ETL ETL

Enjoy!

@Echo Off

Echo.
Echo.
Echo SSIS Package Installation Script
Echo Written by Bob Pearson
Echo 7/6/07
Echo.

if %1a == a goto Error
if %2a == a goto Error
if %3a == a goto Error

Echo.
Echo.
Echo Deployment Server: %1
Echo -----------------------------------------------------
Echo --This will delete any %3 data mart files
Echo --on the server, and reinstall from the local machine
Echo -----------------------------------------------------
Pause
REM Goto Out


REM Remove Existing files and directory on Server
for %%f in (%2"\*.dtsx") do (
Echo Now Removing: %%~nf
dtutil /Q /SourceS %1 /SQL "\%3\\%%~nf" /Del
)

dtutil /Q /SourceS %1 /FDe "SQL;\;%3"

:Create

Echo.
Echo Preparing to create folder
Echo.
pause

REM Create the Directory
dtutil /Q /SourceS %1 /FC "SQL;\;%3"
if errorlevel 1 goto End
Echo.
Echo Preparing to Copy Files to Server
Echo.
pause

:Out
REM copy the SSIS Packages to the server
for %%f in (%2"\*.dtsx") do (
Echo Now Copying: %%~nf
dtutil /Q /DestS %1 /Fi "%%f" /C "SQL;\%3\\%%~nf"
)


Echo.
Echo.
Echo Installation Complete!
Echo.
Echo.
Pause
Goto End

:Error
Echo.
Echo.
Echo Missing Servername!
Echo Syntax: Deploy SSIS Packages [servername] [Source File Path] [MSDB Deploy Folder]
Echo.
Echo.

Pause

:End

Friday, July 2, 2010

Pulling Data from DB2 into SQL Server using SSIS

Well, I am finally back! I am at a new location, in a new industry, and working with new data! Very exciting times!

One of the new things I am working with is pulling data into SSIS from a DB2 database running on an AS/400. The first thing I did was scour the net for information on which provider I should use to pull data from DB2 to SQL Server. I found 4 options:

1. Use a flat file extract from the DB2
2. Use the ODBC provider.
3. Use the iSeries OLEDB provider.
4. Use the Microsoft OLEDB provider.

After some initial testing between the ODBC provider and the iSeries OLEDB, I ruled out the ODBC. It is much too slow. I ran several small tests, but once I got up to 100k records, it was too painful to even complete the ODBC portion of the tests.

I then turned my attention to comparing the Microsoft OLEDB v3.0 provider to the iSeries OLEDB V5R3M0 provider.

I setup an SSIS package with 2 sequence containers: one that contained the flow for the MSOLEDB and one for the iSOLEDB. I setup several SQL SELECT statements from the tables that I will be using in my ETL. I used no WHERE clause, just a basic select. I wanted this test to be as simple as possible.
I then put these 2 containers into a for..next container and set it up to loop 10 times.

Note: I am running on a development server with limited memory (4 gigs, running a SQL server instance and SSIS) and 4 processors.

I ran it on the first SQL statement that would return about 4k rows. The MSOLEDB ran in 298ms, the iSOLEDB in 732ms. I changed the SQL statement to return 400k rows. The MSOLEDB took 83 seconds (1min, 23sec), and the iSOLEDB took 514 seconds (8min, 34sec). I then went for a big test, 1 million rows. The MSOLEDB took 636 seconds (10min, 36sec) and the iSOLEDB took 2008 seconds (33min, 28sec).

So as you can see from these tests, the Microsoft OLEDB for DB2 provider is much faster. However, how would it perform on a multimillion row rowset?

I tested the Microsoft OLEDB provider for DB2 against a table that contained over 12 million rows, with each row being ~500 bytes wide. The flat file import took 15 min, 56sec. The MSOLEDB took 20min, 53sec. However, if I use the OLEDB provider, I do not have to worry about the file not being created, etc. There are less moving parts. So for this example, I am going to use the OLEDB provider. I have several tables that have more than 40 million records. I am going to have to test these out and then decide how much it is worth to not have to rely on external file creation processes.

So, in summary, the new Microsoft OLEDB Provider for DB2 is very speedy. It is a no brainer to use it instead of the iSeries or the ODBC providers. For larger tables however, flat files may still be the way to go.

peace