BI Polar: May 2007

Thursday, May 31, 2007

Architecture: The right tool for the job

Way back when, almost 10 years ago now, I got my first Microsoft Certified Solution Developer (MCSD) certification. In addition to passing exams on Visual Basic 5.0 and SQL Server 6.5 database development, I also needed to pass two “Architecture” exams. [1] I remember walking out of these exams thinking “what a joke,” because it seemed that most of the questions involved picking a different Microsoft product from a short. When you compared them to the VB5 exam [2] they just didn’t measure up. Who cares about picking the right product, I thought – being able to write solid code is much more important…

And I’m certainly not going to disparage the importance of writing solid code. But as time has gone on (and as Microsoft’s product offerings have grown many times broader and more complex) I’ve come to understand why these exams were the cornerstone of Microsoft’s premier[3] developer certification. And that’s because the best code in the world isn’t going to be worth much if it’s built using the wrong tool.

There have been several recent posts on the MSDN SSIS Forums that have made me think of this. Some of them have taken the format “I’m trying to do this thing that is practically copied and pasted from an SSIS white paper, but it doesn’t work – can SSIS do this?” These appear to me to be the result of both a lack of product knowledge – what is the tool supposed to do – as well as a lack of detailed implementation and feature knowledge – how do I use the tool to make it do what I need. For these posts it’s usually enough to point the poster to the right feature – tasks, transforms, expressions and so on.

Other posts have taken the opposite format: “I’m trying to use SSIS to do something that it can probably do well, but which will be 1,000 times more complex than it needs to be – can you please help me force this giant square peg into this tiny round hole?” For these posts, it’s usually best to point out how they can solve the problem with a simple C# app or a T-SQL query or the like. Even though I love SSIS, it’s not the right tool for every job.

So back to my point. [4] It’s vitally important to know these things:

What tools are available
What each tool does
What tasks are appropriate for each tool
What tasks are not appropriate for each tool
How to perform tasks with each tool
How and where to find out the other things when all else fails

That’s quite the list, isn’t it?

I think that this is a big portion [5] of what makes an architect an architect, and not simply a developer. It’s not enough to draw boxes and arrows and look only at the big picture. But it’s also not enough to write the best multithreaded listener either. An architect needs to be able to do both, and to help others do both if he (or she) is going to succeed.

And now I’m done. I don’t know if I said what I was hoping to say, but I’m pretty sure I’m not going to get any more coherent tonight. All the work I needed to get done is done, and sleep’s siren call is getting more seductive by the minute…

[1] That’s right – 70-160 (Microsoft® Windows® Architecture I) and 70-161 (Microsoft® Windows® Architecture II)

[2] Yep – 70-165 (Microsoft® Visual Basic 5.0 Programming)!

[3] This was the case then, anyway. Microsoft’s certification story has grown in breadth and complexity as well, and it’s also gotten a lot more relevant. But that’s a story for another post…

[4] Yes, I do have a point. But it’s almost 02:00 and I’m still not done with my Exchange 2007 migration and ISA 2006 upgrade, so while I wait, I type, and ramble as I fend off sleep.

[5] Another big portion is communication and soft skills, but that’s another story as well.

SQL Agent Man needs a Perm

No, not Ted Malone. And no, not curly hair. But yes, I'm serious.

If you're using SQL Server Agent jobs to execute your SSIS packages on a schedule (and this is something many of us do) then you need to ensure that SQL Server Agent has the necessary permissions, or else you'll get an error like this:

"Executed as user: server\SYSTEM. The package execution failed. The step
failed."

I've seen this problem popping up several times lately on the MSDN SSIS Forums, so I thought I'd add a quick note here. There are two main things that you can do to get around this:

Configure the SQL Server Agent service to run in the context of a Windows account that has the necessary permissions. This is simple enough, but doesn't give very granular control, as this will apply to all jobs that run.
Create a SQL Server Agent Proxy and grant that proxy account the necessary permissions.

SQL Server Agent Proxies are new in SQL Server 2005. so not everyone knows about them yet. In my opinion they are the right tool for the job here. They're easy to set up, provide granular access and let you easily follow the principle of least privilege when configuring all of your scheduled SSIS packages.

Wednesday, May 30, 2007

I knew there was a reason I had a Windows Mobile phone...

This may well be the coolest thing that I've ever seen as far as mobile apps are concerned: an interactive map of the Orlando convention center (yes, where I will be next week at TechEd!) that uses the camera of your Windows Mobile phone. Basically you use it to take a picture of one of the "signposts" that they will have up all over, and it shows you where you are and what sessions are going on near you. Check it out:

http://bmarb.spaces.live.com/Blog/cns!91B9647DD75C91F9!446.entry

If they integrate it in with individual attendees' schedules (so I can see not just what sessions are going on, but which of the sessions I've added to my schedule are going on) I might lose it because it's too damned cool.

Wow.

Iron Architect

If my memory serves me correctly, the architects of Redmond, Washington have a saying:

"Success comes not from the heart, but from the proper application of current
Microsoft technology."[1]

What if Chairman Kaga had a fetish not for food, but for software architecture? He'd be in Orlando next week at TechEd 2007, and his secret ingredient would be... attendee lodging and social networking!

Microsoft is hosting its Iron Architect competition again in Orlando next week, giving all TechEd attendees the opportunity to win a Microsoft Certified Architect certification[2] by developing, presenting and defending an architectural design to fulfill the requirements posted yesterday on the Iron Architect blog. Sounds like fun, right?

So I'm considering putting together a design and presenting it next week. I don't know if I'll have the time between now and then to get it done right, but it can't hurt to try, right?

[1] Yes, I just made this up. No, it's not really anything that they say at Microsoft.
[2] No, you can't actually win the certification itself - you need to earn that. What you actually win is the cost of going through the certification process, which is valued around $10,000 USD.

Tuesday, May 29, 2007

Speak with a Geek!

Although visitors to the SSIS Forums on MSDN know that it's really easy to ask me questions about SSIS, it's about to get even easier. I've been contacted by one of the masterminds behind the geekSpeak webcast series on MSDN to be a guest on the show next month talking about real-world SSIS. Attendees can submit questions in a talk show style format during the show

We haven't set a date yet, but it will probably take place some Wednesday in June. If you have SSIS questions and would like to participate in a fun, live participatory event, start getting ready now, and I'll keep you posted as I have more news to share.

Monday, May 28, 2007

More Dating Advice

There must be some sort of synchronicity at work - just this morning I was blogging on how to convert different string and integer date representations into "real" datetime values, and then this afternoon I found myself needing to include a "seconds formatted as time" column in a view I was building on top of an audit dimension table. This is something I've done many times before, so I figured I could just Google it and get some sample code from any one of a million places online.

But this was not the case. I couldn't find any samples at all. So I dug through my old project code (and dug, and dug, and dug) until I found what I was looking for. And then I had to update it to include millisecond support, because the old code I found didn't go to this grain. Anyway, here's the deal:

1) Start with a table that has a start time and an end time column, like so:

CREATE TABLE DateFormatTemp
(
[ExecStartDT] DATETIME NOT NULL,
[ExecStopDT] DATETIME NOT NULL
)

This looks pretty familiar, right? And it's very simple to DATEDIFF these two fields to find out how many seconds (or whatever) there were between the two dates. But your users (and that probably includes you, too) don't want to have to translate a second count into hours, minutes and seconds. I know I don't, so...

2) Use the T-SQL DATEDIFF, CASE and string concatenation functions to format the results into something more useful [1]:


  SELECT [ExecStartDT]
        ,[ExecStopDT]
        ,DATEDIFF (ss, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE()))    AS ExecutionTimeInSeconds
        ,CASE -- Hours
            WHEN (DATEDIFF  (hh, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE()))) >= 10 THEN
               CAST (DATEDIFF  (hh, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) AS CHAR (2))
            ELSE
               CAST ('0' + CAST (DATEDIFF  (hh, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) AS CHAR (1)) AS CHAR(2))
         END + ':' +
         CASE -- Minutes
            WHEN (DATEDIFF  (mi, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE()))) % 60 >= 10 THEN
               CAST (DATEDIFF  (mi, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 60 AS CHAR (2))
            ELSE
               CAST ('0' + CAST (DATEDIFF  (mi, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 60 AS CHAR (1)) AS CHAR(2))
         END + ':' +
         CASE  -- Seconds
            WHEN (DATEDIFF  (ss, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE()))) % 60 >= 10 THEN
               CAST (DATEDIFF  (ss, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 60 AS CHAR (2))
            ELSE
               CAST ('0' + CAST (DATEDIFF  (ss, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 60 AS CHAR (1)) AS CHAR(2))
         END +
         CASE -- Milliseconds
            WHEN DATEDIFF (ms, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 1000 < 10 THEN
               '.00' + CAST (DATEDIFF (ms, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 1000 AS VARCHAR)
            WHEN DATEDIFF (ms, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 1000 < 100 THEN
               '.0' + CAST (DATEDIFF (ms, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 1000 AS VARCHAR)
            ELSE
               '.' + CAST (DATEDIFF (ms, [ExecStartDT], ISNULL ([ExecStopDT], GETDATE())) % 1000 AS VARCHAR)
         END                                                               AS ExecutionTimeFormatted
    FROM [DateFormatTemp]
ORDER BY ExecutionTimeInSeconds ASC;

When you execute this you get back something that looks vaguely[2] like this:

ExecStartDT      ExecStopDT              Exec... ...Formatted

---------------- ----------------------- ------- ------------

2007-05-28 17:00 2007-05-28 17:03:13.817 193     00:03:13.816

2007-05-28 17:00 2007-05-28 17:03:31.323 211     00:03:31.323

2007-05-28 17:00 2007-05-28 17:05:24.547 324     00:05:24.546

2007-05-28 16:30 2007-05-28 17:04:10.920 2050    01:34:10.920

2007-05-28 16:30 2007-05-28 17:05:19.193 2119    01:35:19.193

2007-05-28 16:20 2007-05-28 17:04:05.973 2645    01:44:05.973

2007-05-28 16:10 2007-05-28 17:03:59.680 3239    01:53:59.680

2007-05-28 16:00 2007-05-28 17:03:42.323 3822    01:03:42.323

2007-05-28 15:00 2007-05-28 17:03:46.710 7426    02:03:46.710

(9 row(s) affected)

This is much more useful than just having the difference in seconds.

Of course, you can put this code in a T-SQL UDF, or have MUCH simpler code that does the same thing in a SQL CLR UDF, but in my case I only need this logic once in each database, and do not want the added complexity of managing SQL CLR anything, so this does exactly what I need.

[1] Useful like the formatting here is not. This code is just too wide to fit nicely here.

[2] I say "vaguely" because I've edited the column headings and start times to make the values fit better on the blog.

Having trouble getting a Date?

There have been several questions posted lately on the MSDN SSIS Forums asking how to convert integer or string date values into "real" DATETIME values in an SSIS package. I figured that this would already be over-documented online, but as the first result I found online (from Kirk Haselden, no less!) didn't actually work, I thought I might as well blog on this myself.

So how do you do this? The key [1] is using the Derived Column transformation in your package data flow, and using the correct expression. Your package's data flow should look something like this [2]:

What's not obvious from the image is what the Derived Column transform is actually doing. This depends, of course, on the input value. Here's the first question from the forums:

My source database stores dates as integers (e.g. 20070101). I need to
convert to a "real" date for my target system.

I'm guessing I need
to create a derived column - could someone help me out with the appropriate
expression?

In this scenario, you must first convert the integer "date" to a string, and then parse that string to get it into the correct format, and then cast that string into a date. Here's the expression:

(DT_DATE)((SUBSTRING((DT_WSTR,8)IntegerDate,5,2) + "-" +
SUBSTRING((DT_WSTR,8)IntegerDate,7,2) + "-" +
SUBSTRING((DT_WSTR,8)IntegerDate,1,4)))

The next question from the forums was this:

Is it possible to convert a date stored as a string into a datetime with
integration services 2005? My attempts with the “data conversion” fail. The
string type form of the date is ‘yyyy-mm-dd’ and the desired result for use in a
Union All is ‘dd/mm/yyyy 12:00:00AM.’

The only thing different between this problem and the one before is in the expression required to perform the conversion. In this case, since we have a string input already, the expression is a little simpler:

(DT_DATE)((SUBSTRING(StringDate,6,2) + "-" + SUBSTRING(StringDate,9,2) + "-" +
SUBSTRING(StringDate,1,4)))

To test these expressions, I created a few simple tables and populated them with sample data. Each test scenario got three valid date values, and one value that is valid for the underlying storage type (integer or string) but not a valid date. Here are the SQL Scripts:

CREATE TABLE Dates
(
IntegerDate INT NOT NULL,
StringDate CHAR(8) NOT
NULL
)
GO
INSERT INTO Dates VALUES (20081231, '20081231')
INSERT
INTO Dates VALUES (20070101, '20070101')
INSERT INTO Dates VALUES (20010130,
'20010130')
INSERT INTO Dates VALUES (20051232, '20051232')

With this as the source, we end up with results like this when we run the package:

[3]

It's vitally important to have both a success path and an error path coming from your Derived Column transform. There is no domain integrity in the source column, so there is no way to be certain that the values you're getting from the source can be successfully converted into dates. Unless you set up your package like this, the first time you get a bad date, your package will fail, instead of handling the error gracefully.

[1] Actually, the key is always using the best tool for the job. If you're getting your data from a relational database, you should probably just do the conversion there using CAST or CONVERT. This will make for a simpler package and usually better performance as well.

[2] Although in the real world, you may want to actually write your records to some destination instead of just counting the rows.

[3] When was the last time a developer got four dates and three of them were good and only one of them was bad? Perhaps I should have come up with a more realistic scenario, eh?

Going to TechEd!

Microsoft's biggest technical conference - TechEd - is taking place next week in Orlando, and I'll be there. This will be the first conference I've attended since TechEd 1998 in Orlando, when SQL Server 7.0 was the "next big thing." I've been working at many of Microsoft's big conferences for the past few years, including TechEd, PDC, Mix and TechReady, but it's been almost 10 years since I've been going without work waiting for me. On one side, it should be a lot of fun, being able to attend sessions and hang out with the product group members in the lounges, but on the other side, it already feels a little lonely.

In any event, I've got all of the interesting-looking BI sessions added to my schedule, so expect me to be blogging when possible throughout the week.

Saturday, May 26, 2007

No! No! No!! Not the PDC!!!

It looks like the Professional Developers Conference this year has been cancelled, or at least indefinitely postponed:

http://msdn2.microsoft.com/en-us/events/bb288534.aspx

We are currently in the process of rescheduling this fall’s Professional
Developer Conference. As the PDC is the definitive developer event focused on
the future of the Microsoft platform, we try to align it to be in front of major
platform milestones. By this fall, however, upcoming platform technologies
including Windows Server 2008, SQL Server codenamed “Katmai,” Visual Studio
codenamed “Orcas” and Silverlight will already be in developers’ hands and
approaching launch, which is where we’ll focus our developer engagement in the
near term. We will update this site when we have a new date for the PDC that is
better timed with the next wave of platform technologies.

I keep looking at the calendar, and it keeps not being April 1st. This is a huge disappointment. I totally understand the stated rationale - PDC is supposed to be about future technologies past the upcoming release - but it's still quite the blow. Hopefully they'll reschedule soon...

Thursday, May 24, 2007

Watching Yellow Boxes Sucks

Here's a chance for the world to learn from my stupidity. Like that will be the first [1] time, eh?

Here I am, 9:35 PM, 2 hours past my regular bedtime. [2] 582,000 records processed so far. it's been 3 hours, probably more. [3] And here I sit, watching yellow boxes, and waiting for an SSIS package to finish running.

Pretty, eh?

I'm testing some updates done by myself and some team members, and need to verify that the package can successfully complete. When it's done, I need to test the next batch. And I need to have them all done tonight, because I'm taking off tomorrow.

But I don't know how many rows are coming out of the data source!

And therein lies the problem. I forgot to check before I ran the package to see how many rows the source query will return with the new set of test data. It might be just a few more rows. It might be a few million.

So here is a tip from me to you: always test your source query before running your packages.

Because watching yellow boxes sucks.

Here I am, 9:50 PM. 621,000 records processed so far...

[1] Or last.

[2] I have small kids, what can I say?

[3] I'm pulling the data from a remote Oracle database on the other side of a VPN - LAN-local performance had better be a lot better than this, eh?

Dr. Ivan Brady Strikes Again!

If you've ever gotten an email from me, odds are it's had this as the signature:

“Context is practically everything when it comes to determining meaning”
-- Dr. Ivan Brady

Dr. Ivan Brady was my favorite professor when I was in college. He almost convinced me to switch majors from Computer Science to Anthropology. Words fell like pearls of wisdom from his mouth. And this was one of his favorite quotes - he used it a lot.

And still, so many years later, I keep being reminded how incredibly true this is. On so many projects - and BI projects in particular - far too much time is lost on false starts and poor communication. Everyone seems to always want to start working, to start doing something, anything, instead of calming down, establish a common context, and truly understanding what the other parties are talking about.

I know, this isn't really news, but this week has reminded me once again how true, how profound, and how important this is. If more people listened to Dr. Brady, the world would be a better place, and we'd get a lot more done...

Conditional Task Execution

As we saw in the last post, the combination of SSIS Expressions, variables and precedence constraints gives us a lot of power for determining what execution path our package should take. But what happens if the logic we need to implement isn't quite so straightforward, such as:

It's the very first task in the package that you need to execute conditionally.
The variable on whose value the task must be conditionally executed is set declaratively via an Expression or Package Configuration, and not procedurally via an Execute SQL (or other) task like we did in the last blog entry.
The conditionally executed task is part of a larger control flow, and there are tasks that must execute beneath it, either when it executes successfully, or when its execution is skipped due to the variable value at runtime.

These situations don't pop up every day, but they do happen often enough for SSIS developers to have a handy addition to their toolbox: The Script Task.

But wait, you say - you don't need to write VB.NET code just to get around this problem, do you?

And, of course, the answer is no - no code is required. The Script task has hidden talents above and beyond its affinity with .NET code: it can sit around and do nothing at all except take up space.

Now tell me that doesn't sound like some .NET developers you know?

In any event, the Script task comes in handy for situations like this because (unlike most other tasks) it is valid in its default state when dragged onto the control flow design surface. You don't need to change anything: Because its default script returns success, it just works - without actually doing any real work, of course - by default.

This means that if we need a "Placeholder" task to serve no other purpose other than to have a precedence constraint drawn from it (as shown in the first two bullets, above) the Script task is what we need.

It also means that the Script task - when placed inside a Sequence container - does what we need in the situation described in the third bullet, above. Take a look at the image below. In this package, we only want to run the Execute Package task if a flag is set to True via a configuration file, and it depends on successful execution off other Execute Package tasks before it, and we need to have more Execute Package tasks below it execute either when it completes successfully or when the flag is set to False. Set up as you see below, it works like a charm.

So, when you're looking for the right tool for the job of not actually doing anything, think of your friendly neighborhood developer. Wait, I mean friendly neighborhood Script task. That's right...

Comparing Database Versions

This post was inspired by a recent thread on the MSDN SSIS Forums. Here's the scenario:

Let's say that you have a source database and a destination database. Let's take this one step further and say that you have multiple instances of these source and destination databases deployed in your environment. Let's make things a little more complicated by saying that there can be (and probably are) different versions of these databases deployed, and that you track the current version in a database table. To make matters worse, it's possible that a corresponding source/destination pair might be updated at different times, so their versions might not always match.

This sounds like a pain in the butt, but also sounds like a pretty common scenario too. So how do you ensure that the SSIS package that you're developing will only move data from source to destination if their versions match? With SSIS variables and Expressions, the solutions is very simple. Here's what's needed:

Two connection managers, one for the source database and one for the destination database.

Two package variables, one to store the version number from the source database and one to store the version number from the destination database.

Two Execute SQL tasks, one to retrieve the version number from the source database and one to retrieve the version number from the destination database and store them in the variables.

A sequence container around the two Execute SQL tasks. This is technically optional, but it makes it easier to build the package logic if you set things up this way.

One precedence constraint configured with a Success constraint and an SSIS Expression instead of just using a Success/Failure/Completion constraint.

Optionally, a second precedence constraint set up with a completion constraint logically opposite Expression, so you can have a second path of execution for when the versions do not match.

The rest of your package logic. This is up to you, I'm afraid.

When you're done, you'll have something that looks a lot like this:

Enjoy!

Wednesday, May 23, 2007

Flexible File System Deployment

This blog post has been sitting in my drafts folder since the day I created this blog, and has been kicking around in my mind for much, much longer. The real world just keeps getting in the way of my best intentions to blog… Anyway, a thread on the MSDN SSIS Forums today inspired me to get off my hard-working butt and actually get it finished.

So, you want to deploy your SSIS packages to the file system, but also want to be able to deploy them to any path on your test and production servers, not just in the folder that you used in development? You've come to the right place.

The two primary tools that make this possible are SSIS Package Configurations and SSIS Expressions. [1] Here's what you need to do:

Create a string variable named DeploymentFolderAbsolutePath. [2]
Assign the DeploymentFolderAbsolutePath variable a value to reference the absolute path of the root folder of your SSIS Visual Studio [3] solution directory, such as "C:\Projects\ProjectName\SSIS\". Remember to include the trailing backslash character.
Create a string variable named ProjectFolderRelativePath.
Assign the ProjectFolderRelativePath variable a value to reference the name of the current project SSIS Visual Studio project directory, such as "ERP.Extract\".Remember to include the trailing backslash character.
Create a string variable named ProjectFolderCalculatedPath. Set the EvaluateAsExpression property of the ProjectFolderCalculatedPath variable to true and set the Expression property of the ProjectFolderCalculatedPath variable to "@[User::DeploymentFolderAbsolutePath] + @[User::ProjectFolderRelativePath]". This will evaluate to the full path to your project directory, including the trailing backslash – something like "C:\Projects\ProjectName\SSIS\ ERP.Extract\".
Create a Package Configuration that configures the value of the DeploymentFolderAbsolutePath variable. I personally prefer using an environment variable configuration for this variable – you can script it with a REG file that looks like this:
REGEDIT4
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment]
"SSIS_Root_Deployment_Path"="C:\\Projects\\ProjectName\\SSIS\\"
At this point, you now have a variable that contains the fully qualified path to the folder that contains your SSIS package. But why did we have to jump through so many hoops? Because we're going to use this over and over and over again. Save this package for use as a template for other packages.
Now, whenever you need to access any file system-based resources, such as flat files or file connection managers used to locate child packages for use by Execute Package tasks, you have a starting point on the file system that moves whenever your deployment folder moves. For example, you can use an expression for the ConnectionString property of a file connection manager to identify the child package, like "@[User::ProjectFolderCalculatedPath] + "ERP_Extract_Stuff.dtsx"".

If you're like me, you design your SSIS projects to have a single "master" package that drives the execution specific to a business process, and a set of "child" or "worker" packages that are called by the master. In this scenario, having this flexible file system deployment framework makes it trivial to manage.

But what do you need to do to make this work?

First and foremost, test and play in your own environment to find out what works best for you. Once you've done this, save your package for use as a template, and for every package that you create from this point on, you must create it from that template. Once this is done, there are a few simple things to do:

Whenever you create a new project, change the value of the ProjectFolderRelativePath variable to reflect the new project folder name.
Whenever you access file system resources, always use an expression based on the ProjectFolderCalculatedPath variable to identify the path of the resource.

It's the second item here that will trip you up, especially if you have many developers on your ETL team. Sometimes people forget. Because of this, it's a best practice to perform regular test executions of your packages from folders other than the default development project folders – AND rename the default folder while you run the tests. If you forget to do this, it is possible for the packages to be built incorrectly with hard-coded paths, and your test will not catch the problem, as the resources in the default location will be used without your knowing it. One of these days I'll write a little test harness using the SSIS object model to look for this problem [4], but it won't be today…

[1] I have another blog post extolling the virtues of Expressions that has been kicking around even longer, but it's taking even longer to finish.

[2] Obviously, you can name the variables whatever you want – this is just what I like to call them.

[3] I love the term "Visual Studio" and for some irrational reason hate the term "BIDS."

[4] And when I do, I'll post the code here.

Tuesday, May 22, 2007

BIGINT Confusion Explained

I've known for some time that to work with BIGINT values from SQL Server in SSIS, you needed to store them in String package variables - but I never knew why. Now, thanks to an excellent post from Matt Masson on the SSIS team, we know why:

http://blogs.msdn.com/mattm/archive/2007/04/18/why-can-t-i-store-my-bigint-result-in-an-int64-variable.aspx

Thanks, Matt!

Monday, May 21, 2007

SSIS Expressions == Declarative Goodness!

Developers love declarative things. It's better to use XAML markup to define your UI than to explicitly define control instances and set their properties in your C# code. It's better to use XSLT to transform your XML data into HTML than it is to manually iterate through DOM objects doing it in code. It's better to specify a set-based SQL query than it is to loop through records with a cursor. You get the idea, right?

So why not carry the same motif into the world of SQL Server Integration Services? I've been spending time lately on the SSIS Forums on MSDN, and have noticed that many of the questions being asked can be addressed by the judicious use of SSIS expressions.

What are expressions? Funny you should ask...

According to SQL Server Books Online:

Expressions are a combination of symbols (identifiers, literals, functions, and operators) that yields a single data value. Simple expressions can be a single constant, variable, or function. More frequently, expressions are complex, using multiple operators and functions, and referencing multiple columns and variables.

According to Matthew:

Expressions are a gift from Odin himself, a blessing upon the world of SSIS developers, a magical spear carved from the World Tree itself, with which to bring death unto the plague of static, difficult-to-maintain packages.

Ok, so sometimes I get carried away. Still, expressions are probably the single most powerful tool available to SSIS package developers. SSIS expressions allow you to update the properties of tasks, components, variables, connection managers - pretty much anything in your SSIS package -so that the value of those properties are dynamically evaluated at runtime. Think of expressions as if you could arbitrarily extend the properties of your favorite .NET objects so that when the properties were accessed at runtime your code - and not the code in the property accessor - was run to determine the value to be accessed. It's not exactly what SSIS expressions do, but it's a decent analogy. The key thing to remember is that SSIS expressions allow you to hook up dynamic values to almost any property of almost any component in your SSIS packages so that your packages can respond to changes in their runtime and design time environments. What could be cooler than that?

With all of this said, SSIS expressions in their current form are not perfect. What would make SSIS expressions even better?

True ubiquity: Currently not all properties of all components (especially data flow components "hidden" within the data flow task) can be bound to expressions.

Better discoverability: Currently there is no way to tell if a specific property is based on an expression simply by looking at that property in the Visual Studio Properties window. You need to expand out the expressions property node and see if the property is listed there. What we need is some sort of visual indicator next to the properties. Perhaps an icon, perhaps a different text color, but definitely something.

Better editor support: The current expression editor is definitely a v1 tool, and could use a lot of usability work.

Wow - can you believe how long it took me to finish this post? I started writing it in May, and didn't finish it until September. I think I might be a little busy. Thank goodness for the online expense reporting system I had to use today, which gave me lots of time to work on other things while it did whatever it was doing to take so long for each page to load. ;-)

In any event, back in May I had envisioned a series of articles that each introduced a different problem and how it could be solved by using SSIS expressions. I'm not sure that this series is coming any time soon, but I'll work on it as time permits.

Until then - can you believe there's a post in my Drafts folder that is older than this one?

Sunday, May 20, 2007

Data-Mining Goes Mainstream

You know that a given technology field is "going mainstream" when it starts to get moderately intelligent coverage from the non-technology mainstream press. So when I saw today an article titled "Reaping Results: Data-Mining Goes Mainstream" in the "Your Money" section of The New York Times, I had to give it a read.

You can find it online here (free registration is required): http://www.nytimes.com/2007/05/20/business/yourmoney/20compute.html

It's not anything earthshaking or groundbreaking, but I think it does a decent job of presenting a complex BI technology in a manner that non-technical people can appreciate and understand. If you're in a position where you need to present data mining and BI to someone who "just doesn't get it" (especially since they always have the biggest budgets!) this might be worth sharing with them.

Saturday, May 19, 2007

Photomerge

This isn't a business intelligence post, but one of the reasons I picked the blog name "BI Polar" [1] was because I tend to jump around a lot from topic to topic. Anyway...

...I just discovered a feature in Adobe Photoshop called "Photomerge." It's located on the File Automate submenu and basically "stitches" together a series of photographs to make a larger single panoramic image. Take a look at this example:

(click on the thumbnail for a larger version)

Around a month ago we got a very unseasonable Spring blizzard. We got over 2 feet worth of wet, heavy snow in around 10 hours, and it brought down one of my neighbor's treees onto the fober optic cable through which we get both dial tone and internet. So I was out there in the think of it shoveling the driveway so the repair crews could get to the mess. Before I got started I took a series of six photos, not really making any effort to make them line up with each other.

But Photoshop took them all and with no user input (other than selecting the input files) created the single panoramic image you see above.

Ok, so I may be easily impressed (all of you graphic artists and designers can rest easy, knowing your jobs are secure) but this is a damned impressive feature in my book. I stumbled across it looking for something else, and am glad I did. I wonder if I should buy a Photoshop book or something...

[1] Other blog names considered but ultimately rejected included:

BI and Proud
BI Curious
The Big BI Guy (sort of a homage to "SoCalDevGal")

But for some reason people seemed to think that these would give the wrong impression or something. ;-)

Thursday, May 10, 2007

A Great Quote

Here's a great quote from Jeff Raikes made during his BI Conference Keynote:

"Strategy without action is a daydream.
Action without strategy is a nightmare."

It's a little trite, but definitely profound. I wish I'd thought of it first.

Put a Consultant out of Business

Performance tuning is something of a black art. It's one of those things that before you can do it, you need to understand - really understand - what's going on, not only in your application, but in each of the platform components that makes up your application.

SQL Server has gotten easier to tune with every release. SQL Server 2005 has lots of features built in to make it easier to diagnose performance problems and to build better performing applications from the beginning. But SQL Server 2005 also has lots of architectural changes under the hood, which means that in order to dig deep, you need to understand those changes.

But Microsoft has recently "stealth released" a set of SQL Server Reporting Services reports that provide deeper and more actionable insight into the performance of your SQL Server than has ever been possible before without having access to a top-tier performance consultant. The SQL Server 2005 Performance Dashboard Reports report pack integrates directly into SQL Server Management Studio (using the new "Custom Reports" feature in SP2) and use realtime performance data from SQL Server's dynamic management views (DMVs) to identify poorly-performing queries and processes, show you the details on the problem, and (this is the big one) identify steps to take to resolve them. This often includes SQL DDL code to create indices, statistics, or whatever changes that can make your system run better.

Realistically, as a SQL Server consultant I'm not really afraid of being made obsolete by these amazing tools. There are some decisions that can only be made, and some problems that can only be solved by a truly deep knowledge of the platform. These reports are a major advance in moving the point at which the average SQL Server DBA needs to ask for help, and that means that the consultant is more likely to have a really interesting problem to solve when he is called in. And that can only be a good thing!

Be Gone, Evil Formatting - I Cast Thee Out!!

You know the feeling - you copy something from your web browser and go to paste it into whatever application you're working in. The text comes through, but with it comes loads of annoying formatting that you really do not want. If you're using Office 2007, it's easy enough to use the little smart-tag to say "keep text only" but if you're using just about anything else (and if you're like me) then your blood pressure rises just a little more as you launch notepad, paste the text, copy it back out of notepad (now delightfully format-free) and paste it into your target application once more.

I've done this at least a thousand times so far this year. And I'm sure I'm not alone. In fact, I did it once in my last blog entry, and that one time broke the proverbial camel's back.

Enter PureText: http://www.stevemiller.net/puretext/.

PureText is an application allows you to paste the text that's on the Windows clipboard - just the text, and nothing else - into any application, format free and blood pressure neutral. It defines a global keyboard shortcut (Win-V) by default so you never need to take your hands off the keyboard. What could be better?

This isn't a new tool, but it is new to me. I'm sure that once I can get my fingers trained to hit Win-V instead of ctrl-V, I'll swear a lot less. Check it out.

Katmai == BI++ ?

Microsoft positioned SQL Server 2005 as the "BI release" of its flagship SQL Server database product. It's not hard to see that this release saw the biggest leap forward in Business Intelligence features, with SSIS, SSAS and SSRS, along with usability, scalability and feature improvements in the core RDBMS. How can they top this?

I have a feeling we're about to see how. ;-)

At this week's first-ever Microsoft Business Intelligence Conference this week in Seattle, Microsoft has started to unveil some of its plans for the next release of SQL Server, codenamed "Katmai." Sadly, my travel schedule could not permit me to be there this week, but that does not prevent me from keeping an eye on the news coming out of the conference. There's not yet a lot of "hard" information coming out, but Microsoft has started launching press releases with "BI" and "Katmai" prominently featured. Take a look at the conference link above - there is a keynote video from Jeff Raikes (President, Microsoft Business Division) as well as podcasts from Product and Program Managers and key BI partners. Hopefully there will be more coming soon...

...and I cannot wait until I get my hands on that Katmai CTP that everyone is hinting at...

Tuesday, May 8, 2007

GhostDoc

Thanks to the "Toolbox" column in this month's MSDN Magazine, I have recently discovered an amazing and free developer tool: Roland Weigelt's GhostDoc. GhostDoc is an add-in for Visual Studio that will automatically add XML comments to your C# or VB code.

Who cares? You do! Doesn't Visual Studio 2005 have XML comments built in already? Yes, but not like this!

The cool thing about GhostDoc is that not only does it add the XML comment structure like Visual Studio does, it also adds the information that you as developer would otherwise need to add. It has its own built-in rules to parse through your code to come up with basic descriptions of what each member does (yes, you can define your own rules, although I've yet to do so myself) and populates the XML comments appropriately.

I've been spending so much time on BI projects lately that until the last few weeks I haven't done any "real" traditional .NET development since the beginning of the year. Lately, however, I've been developing some custom SSIS Control Flow and Data Flow components and a few utilities for batch updates to large numbers of SSIS packages (hopefully I'll have something to share on both of these fronts soon), so I've been spending some quality time with C# again. And GhostDoc is proving to be a major time-saver, especially for an anal-retentive comment junkie like myself. Take a look at this example:

I started out with this C# method:


        public void CreateVariable(string variableName, TypeCode dataType,
            bool readOnly, string variableNamespace, object value,
            bool evaluateAsExpression, string expression,
            string description)
        {
            object variableValue;

            switch (variableNamespace)
            {
                case "System":
                    throw new ApplicationException("You cannot create new System variables");
                case "":
                    variableNamespace = "User";
                    break;
            }

            if (SsisHelper.IsCastValid(dataType, value))
            {
                variableValue = Convert.ChangeType(value, dataType);
            }
            else
            {
                throw new ApplicationException(string.Format(
                    "Data type mismatch: unable to convert supplied value '{0}' into requested data type '{1}'",
                    value, dataType));
            }

            if (VariableExists(variableName, variableNamespace))
            {
                ssisPackage.Variables.Remove(variableNamespace + "::" + variableName);
            }

            Variable v = ssisPackage.Variables.Add(variableName, readOnly, variableNamespace,
                variableValue);

            v.Description = description;

            if (evaluateAsExpression)
            {
                v.EvaluateAsExpression = true;
                v.Expression = expression;
            }
        }

And with a single keyboard shortcut (Ctrl-Shift-D by default) GhostDoc added this XML comment header to the method:


        /// <summary>
        /// Creates the variable.
        /// </summary>
        /// <param name="variableName">Name of the variable.</param>
        /// <param name="dataType">Type of the data.</param>
        /// <param name="readOnly">if set to <c>true</c> [read only].</param>
        /// <param name="variableNamespace">The variable namespace.</param>
        /// <param name="value">The value.</param>
        /// <param name="evaluateAsExpression">if set to <c>true</c> [evaluate as expression].</param>
        /// <param name="expression">The expression.</param>
        /// <param name="description">The description.</param>

Although some tweaking and editing will still be required, having this as a zero-effort starting point is a huge step forward. If you're doing any .NET development, you owe it to yourself to check out this great tool!

You can download GhostDoc for free here: http://www.roland-weigelt.de/ghostdoc/

BI the Book!

Currently there are not many books available to help people get up to speed on the Microsoft Business Intelligence platform. Most of the books I've found assume a decent level of technical knowledge in the reader. This isn't a problem for people who have "working in the field" for many years, but not everyone has the experience necessary to get them kick-started in this complex arena. To make matters worse, there are so many different technologies that come together in the world of BI, that it can quickly become overwhelming for people who are trying to get up to speed. I get asked pretty regularly if there is a good book for beginners to BI - and finally I can say "yes!"

Apress has just published "Foundations of SQL Server 2005 Business Intelligence" by my good friend and colleague Lynn Langit. This book provides a solid introduction to all of the core components in the Microsoft BI platform, including Integration Services, Analysis Services and Reporting Services, and has excellent coverage of Office 2007, SharePoint 2007 and PerformancePoint Server 2007 as well. It's not going to make you an expert, but it will give you the knowledge and tools you need to get prepared and productive for your first BI projects.

In the context of "full disclosure," I served as the technical reviewer for this book. It was a real pleasure to work with Lynn during the writing process, and I had a lot of fun. Lynn, on the other hand, had to put up with my overly critical and picky feedback for months. It says a lot about her patience that she still talks to me. :-)

And yesterday I got an email from Ronald Smiley, a fellow Microsoft Certified Trainer, who said:

Mathew,
Just a note to deliver huge "KUDOS" for the book Foundations of SQL Server 2005 Business Intelligence by Lynn Langit. You are listed as the technical reviewer. Lynn gave a free copy of this book to the first 20 MCTs who responded to her, I received my copy last week and have been enthralled with its' straightforward comprehensive approach to BI. I have been delivering training and implementing SQL solutions since 6.5 and I think this book struck the right balance of being informative and yet not overwhelming.
Again, Thanks to both of you!
Ron Smiley, MCT, MCDBA, MCTS, MCTIP

And that just about says it all - Thanks, Ron! Definitely check out the book - it should be on store shelves everywhere by now - and start working with Microsoft's incredible BI tools!

Welcome to the BI Polar Blog

I've been working for more than 10 years as a software developer and trainer, with a strong focus on Microsoft database and development technologies. If it has to do with SQL Server or the .NET Framework, odds are I've been there and done that. For the last year or so I've been working more and more with the Microsoft Business Intelligence platform, specifically SQL Server 2005 Integration Services, Analysis Services, Reporting Services and Microsoft Office 2007. This a great toolset, but I haven't found the same level of community support online that I'm accustomed to seeing with other more mainstream Microsoft development technologies like .NET and SQL Server.

So... in an effort to start doing my part, I'm starting a blog to share my pains and joys and lessons learned with the Microsoft BI community at large. I'm pretty busy these days working on some exciting BI projects, but hopefully I'll still be able to post regularly. Stay tuned!