Monday, May 28, 2007

Having trouble getting a Date?

There have been several questions posted lately on the MSDN SSIS Forums asking how to convert integer or string date values into "real" DATETIME values in an SSIS package. I figured that this would already be over-documented online, but as the first result I found online (from Kirk Haselden, no less!) didn't actually work, I thought I might as well blog on this myself.
So how do you do this? The key [1] is using the Derived Column transformation in your package data flow, and using the correct expression. Your package's data flow should look something like this [2]:



What's not obvious from the image is what the Derived Column transform is actually doing. This depends, of course, on the input value. Here's the first question from the forums:

My source database stores dates as integers (e.g. 20070101). I need to
convert to a "real" date for my target system.

I'm guessing I need
to create a derived column - could someone help me out with the appropriate
expression?
In this scenario, you must first convert the integer "date" to a string, and then parse that string to get it into the correct format, and then cast that string into a date. Here's the expression:
(DT_DATE)((SUBSTRING((DT_WSTR,8)IntegerDate,5,2) + "-" +
SUBSTRING((DT_WSTR,8)IntegerDate,7,2) + "-" +
SUBSTRING((DT_WSTR,8)IntegerDate,1,4)))
The next question from the forums was this:
Is it possible to convert a date stored as a string into a datetime with
integration services 2005? My attempts with the “data conversion” fail. The
string type form of the date is ‘yyyy-mm-dd’ and the desired result for use in a
Union All is ‘dd/mm/yyyy 12:00:00AM.’
The only thing different between this problem and the one before is in the expression required to perform the conversion. In this case, since we have a string input already, the expression is a little simpler:
(DT_DATE)((SUBSTRING(StringDate,6,2) + "-" + SUBSTRING(StringDate,9,2) + "-" +
SUBSTRING(StringDate,1,4)))
To test these expressions, I created a few simple tables and populated them with sample data. Each test scenario got three valid date values, and one value that is valid for the underlying storage type (integer or string) but not a valid date. Here are the SQL Scripts:

CREATE TABLE Dates
(
IntegerDate INT NOT NULL,
StringDate CHAR(8) NOT
NULL
)
GO
INSERT INTO Dates VALUES (20081231, '20081231')
INSERT
INTO Dates VALUES (20070101, '20070101')
INSERT INTO Dates VALUES (20010130,
'20010130')
INSERT INTO Dates VALUES (20051232, '20051232')
With this as the source, we end up with results like this when we run the package:
[3]

It's vitally important to have both a success path and an error path coming from your Derived Column transform. There is no domain integrity in the source column, so there is no way to be certain that the values you're getting from the source can be successfully converted into dates. Unless you set up your package like this, the first time you get a bad date, your package will fail, instead of handling the error gracefully.


[1] Actually, the key is always using the best tool for the job. If you're getting your data from a relational database, you should probably just do the conversion there using CAST or CONVERT. This will make for a simpler package and usually better performance as well.
[2] Although in the real world, you may want to actually write your records to some destination instead of just counting the rows.
[3] When was the last time a developer got four dates and three of them were good and only one of them was bad? Perhaps I should have come up with a more realistic scenario, eh?

8 comments:

gal said...

hi there..great info..but how about if the date is already in date datatype and the time is in time datatype....
exp:
mydate : 2008-11-13 (date)
mytime : 03:05:02 (time)

convert this to :
2008-11-13 03:05:02(datetime)

my expresion of
(DT_DATE)(DT_STR,20,1252)mydate + " " + (DT_STR,20,1252)mytime
didt work...y ya?pls help

Matthew Roche said...

Try this instead:

(DT_DATE) ((DT_STR, 20, 1252) @[User::mydate] + " " + (DT_STR, 20, 1252) @[User::mytime])

Essentially you need to have two strings before you can concatenate them into a single string that can then be cast into a date. The placement of the parentheses force the concatenation to take place first.

rich said...

Hi and thanks for the blog..
i followed your example and am still having a problem
Code in the derived column task:
(DT_DATE)((SUBSTRING((DT_WSTR,8)PolicyEff,5,2) + "-" + SUBSTRING((DT_WSTR,8)PolicyEff,7,2) + "-" + SUBSTRING((DT_WSTR,8)PolicyEff,1,4)))

example of data from data source:

20080101
20080101
20080101
20080101
20080514

error msg:
[Derived Column 2 [12869]] Error: An error occurred while attempting to perform a type cast.

rich said...

any ideas/suggestions?

Amol Chaudhari said...

Hi Matthew Roche,

Nice Article. actually i have some different req. as i have date column as 021009 in DDMMYY format. is it possible that i can do only date validation?

Means i want to check for that column value,if it is a valid date or not. If it valid date then i want to insert that same(021009 in DDMMYY) value in database table either it should add that row to error file?

Is it possible? please give me some hint. It will be very thankful..

thanks in advance.

-amol

Matthew Roche said...

Cho - I would suggest adding a Derived Column to your data flow that converts your date string into a date, and redirect error rows to a different data flow path. If the string is not a valid date, then it should error and you can handle it based on your business needs.

Amol Chaudhari said...
This comment has been removed by the author.
Amol Chaudhari said...

Hi Matthew,

Thanks for the reply. I really appreciate it.

I am new to SSIS and has got the task to load the flat file into database tables.

After flat file source there is a derived column control for date column validation. Derived column output goes to OLEDB destination. If there are no date errors then all the records are inserted into database table. but if something goes wrong with date then that row is inserted into the error reporting file which is a flat destination file set on red arrow from Derived column control.

Date validation is done as follows in Derived column control :

StartDate Replace 'StartDate' ISNULL((DT_DATE)((SUBSTRING(StartDate,1,2) + "-" + SUBSTRING(StartDate,3,2) + "-" + SUBSTRING(StartDate,5,2)))) ? "" : StartDate string [DT_STR] 7 1252 (ANSI - Latin I)


Above start date Null insertion into database table issue is resolved now but i am getting another issue :

when the date is not in correct format that record is inserted into the flat file destination which is the error reporting file. But when the date 231309 it is putting the full record containg this date but the date field is coming blank.

Flat destination file is connected with Red arrow from Derived column control and all the columns are mapped correctly. But when the date is not valid, its getting the error and putting that record in error reporting file with blank value for StartDate column.

Please provide ur views on this... it will be very helpful to me.

Thanks in advance..

-Regards,
Amol Chaudhari