Showing posts with label datetime. Show all posts
Showing posts with label datetime. Show all posts

Friday, March 6, 2009

Last Update Dates

It's been a while... We are well into deployment and finalization of this version of our ETL and reporting system. Things are going well.

Recently I ran into an interesting ETL problem while using a source system "last update" field. Let me give you some background.

We have an ETL process that reads from a source system that was developed in-house. The queries were all based on the last update field in all of the tables.

While in UAT, several reports were reported as missing rows. After investigating, it appeared that the rows had never made it to the data mart. Needless to say, this was very worrisome.

I researched and tried to find out why these rows were excluded. There seemed to be no pattern, just random rows.

While looking at my morning logs, I noticed something strange. The ETL Last Update table showed times from about 5 hours after the ETL had run. It should have been when the ETL had run.

I looked in the source system, and there were 3 rows that had update dates in the future! 5 hours to be exact!

It turns out that under certain circumstances, the source system was using the wrong date/time to update the last update field. And this date was GMT, so it was 5 hours in the "future" when it was applied to the last update field.

The result of this was missed records on the ETL. It would miss 5 hours worth of updates anytime this occured in the source system.

So my recommendation, which I am now kicking myself for not implementing to begin with, is this: Always use a date range, not just a "Greater than" for last update fields. For example:

Where LastUpdate Between '3/5/09 11:00:00' and GetDate()


And the other rule... Never trust the source system to be accurate 100% of the time. Anticipate issues like this.

Anyway, that's all for now.

peace

Wednesday, December 10, 2008

DateTime Columns in Slowly Changing Dimension Component

We had an interesting error when implementing a slowly changing dimension in SSIS this week.

We had a date column that we were passing thru to the database from the source. However, the source was a script task because it came in from a multi-resultset stored procedure. In other data flows, where we were using an OLE DB destination, we typed datetime columns as DT_DATE. However, when we used DT_DATE with the slowly changing dimension component, it threw an error:
Error at Import Data [Slowly Changing Dimension]:The input column "input column "ALDATE (15157)" cannot be mapped to external column "external column "ALDATE (15160)" because they have different data types. The Slowly Changing Dimension transformation does not allow mapping between column of different types except for DT_STR and DT_WDTR.
After much digging and research, we determined that we had to set the output column type on the script component to DT_DBTIMESTAMP. Once we changed that, the SCD worked just fine!

peace