Friday, December 5, 2008

Importing CSV Files with SSIS

One of the shortcomings of the comma separated value (CSV) file import in SSIS is the inability for SSIS to determine if there are too few fields in a record.

According to Microsoft, they only check for the end of record indicator (Ususally a CR/LF) when processing the last column in the file. If you are not on the last column, and a CR/LF is encountered, it is ignored and the first field of the new line is added to the last column of the previous line.

Here is an example of what you see:

SSIS is set up to accept 3 columns in a CSV file: Col1, Col2 and Col3.

File comes in like this:


When ssis imports this file it produces this result:
Record 1: A1, A2B1, B2
Record 2: C1,C2 Error - Column Delimiter Not Found
Now, when troubleshooting this error, you can be looking in the wrong place, since the error usually shows up only at the end of the file.

So what I have been doing to gracefully catch and report on this is to quickly pre-process the file to count the columns. Since the processes that create the CSV files for my app are automated, I can assume that every record in the file has the same number of fields, and I only need to count 1 row. If you cannot assume this, it would be just as easy to loop thru each row and determine the field count.

I would also like to point out that my code does not yet handle commas embedded within quote delimited fields. For example, if your record looked like this:
1,"this is um, the field", 1,1
I will be posting code for that next week sometime.

Ok, so here is what I am doing.

I set up a script task in the control flow, before I try to open the file with ssis. This is the code I use:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Imports System.IO

Public Class ScriptMain

Public Sub Main()

Dim sr As StreamReader
Dim Textline As String
Dim Fields As String()

sr = File.OpenText("C:\FileName.csv")

If Not sr.EndOfStream Then
sr.ReadLine() 'Skip 1 row (Header)
End If

If Not sr.EndOfStream Then
Textline = sr.ReadLine() 'Read in the entire line of text.
Fields = Textline.Split(Convert.ToChar(",")) ' Split the line of text into a string array.

Dts.TaskResult = Dts.Results.Failure 'There were no records after the header. This might not be a failure in all implementations.
Exit Sub
End If


If Fields.Length <> 25 Then ' If there are not 25 fields in the record, raise error.
Dts.TaskResult = Dts.Results.Failure
Err.Raise(vbObjectError + 513, Nothing, "Field Count is Invalid! 25 expected, " + Fields.Length.ToString() + " received.")
Exit Sub
End If

Dts.TaskResult = Dts.Results.Success

End Sub

End Class

When this code runs, it generates an error if the field count is not correct.

If you need to check multiple file layouts, you can put the script task into a foreach container in SSIS and use database lookups to get the field length based on the filename, making this code totally reusable!

Also note... I actually converted this code into a C# dll and put it in the GAC on the SSIS server so all I have to do is call that dll in my script task. I will blog about that soon.

Hope this helps!


1 comment:

dexter said...

Boss thank you very much ,you saved a lot of time buy ur post.I had to make some changes in the process but the draft was good .