Thursday, May 31, 2007

More Dating Problems

Ever have one of those dates that just bugs you whenever you look back on it? I've been having one since last night. More specifically, I've been researching a problem with the way that the SSIS Flat File Source data flow component handles bad dates. The problem became apparent to me because of a thread on the MSDN SSIS forums. And it doesn't handle them particularly well. [1] Here's the deal:

The SSIS flat file component relies on the Ole Automation method VarDateFromStr. This method will attempt to parse date strings in different formats regardless of the locale of the server on which the SSIS package is running. Here's an excerpt from the KB article that describes the underlying problem:

CAUSE
In VarDateFromStr, the code does not strictly check the string passed
against the date format of the default system locale, default user locale, or
the locale passed to the function. The function returns without error if the
passed string is valid in any of the following date formats:
• DD-MM-YYYY
• MM-DD-YYYY
• YYYY-MM-DD

RESOLUTION
If you need to verify that the date is valid based on the specific date
format for a locale, do not depend on the VarDateFromStr function. This implies
that any code relying on the MFC function COleDateTime::ParseDateTime to do
strict checking should be changed also because this MFC function uses
VarDateFromStr internally. Microsoft recommends that you check the string first
in your code before calling VarDateFromStr or COleDateTime::ParseDateTime to
make sure it is a correct date for the date format of the locale. [2]

How does this show up in SSIS? Basically, unless your flat file contains dates in either OLE DB canonical format (yyyy-mm-dd hh:mm:ss) or ISO 8601 format [3], you will need to manually check the validity of your input dates. The simplest way to do this is probably loading the date column as a string and then doing the type casting using a Derived Column transform in your data flow.

The workaround is certainly not difficult, but the problem also isn't particularly obvious. The KB article that describes the problem doesn't mention SSIS (or for that matter, any operating systems more recent than NT4) and my research into the problem didn't yield any fruit through online searching. Hopefully now that we have this blog post and the MSDN forums thread, the details will be easier for everyone to locate...

[1] Personally, I've always handled bad dates by leaving early and stiffing the girl with the check. Some people say that this is inappropriate, but I think that this SSIS behavior is even worse, because it's going to fail silently and who knows when you'll discover it.
[2] Unfortunately, there is no "strict checking" option or the like for the SSIS Flat File source component, so your options here appear to be limited to working around the problem, rather than solving it.
[3] In which case, VarDateFromStr is not called.

2 comments:

Phil Brammer said...

One note. The use of a 4-digit year should solve this problem as well. Which, aren't we all using 4-digit years yet?? ;)

Matthew Roche said...

Yeah, that was one vital aspect of the problem scenario that I neglected to mention here, wasn't it? At least I remembered to link to the Forums thread where this was covered in some depth.

My personal guess is that this will most likely be seen by people who get flat files from legacy systems, which may well not use four-digit years. We developers just need to remember to use all four digits when we're building NEW legacy systems, I suppose...