Wednesday, December 31, 2008

Christmas Came Late!

Santa has just delivered the gift that many SSIS developers have had on their Christmas lists: more samples and options for programmatically building SSIS packages. Although in this case, the role of Santa is played by Matt Masson of the SSIS team, who must have gained quite a bit of weight since the last time I saw him. Matt posted seven new articles on his blog yesterday, including this index post here: http://blogs.msdn.com/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx

There are samples for building a package with a data flow task, and for adding OLE DB Source and Destination components, adding the ADO.NET Source component, adding a Row Count transformation and adding a Conditional Split transformation. Each post includes nicely commented C# code, and each one goes a long way towards filling the documentation gap around the SSIS data flow API.

And if that's not exciting enough, Santa was just getting started. Evgeny Koblov, a tester on  the SSIS team has gone far beyond simply it easier to work with the less-than-intuitive COM API exposed by the SSIS data flow. He has built a better API. It's called EzAPI and you can read about it on Matt Masson's blog here: http://blogs.msdn.com/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx. You can also download it and start using it on the CodePlex web site here: http://www.codeplex.com/SQLSrvIntegrationSrv/Release/ProjectReleases.aspx?ReleaseId=21238

EzAPI is probably the most exciting thing I've seen coming into the world of SSIS since the release of SQL Server 2008, if not before. It's essentially a native .NET wrapper around the underlying COM API, which doesn't sound particularly interesting at first glance, but since it delivers the ability to quickly and easily build SSIS packages and data flows through code, it's sure to be a time-saver (if not life-saver) for many SSIS developers.

Now please excuse me while I download EzAPI and start to play...

Tuesday, December 23, 2008

Math and Terror

My son just celebrated his eighth birthday. While he was eating his birthday breakfast before heading to school, he came out with this gem:

"I'm half of sixteen today!"

Now part of me is happy that he's thinking of mathematical relationships at the breakfast table. But another part of me is thinking how I am so incredibly not yet ready for sixteen. These first eight years have gone by so quickly, I can't begin to imagine how quickly the next will fly.

So the next time you're wondering if the glass is half empty or half full, just be glad that it isn't half of sixteen...

Thursday, December 18, 2008

Better File Rename

This is just a random tool recommendation - I've been meaning to post it for months, but have never made the time. Well, a meeting just finished early, so the time made itself.

I've love this tool: Better File Rename

It's a Windows shell extension that gives you a ton (maybe more) of batch file renaming options. Right-click on a folder or a group of files and you can:

  • Add characters to the beginning of a file
  • Change the file extension
  • Add sequence numbers to the file name
  • Add date/time stamps to the file name
  • Replace characters or string patterns in the file name
  • A staggering number of other possibilities

I've been using this delightful little tool since 2002, and it works great on Windows XP, Windows Vista, Windows Server 2003 and I assume Windows Server 2008 as well, although I've never personally tested it. If I ever rebuild my primary laptop to run Server 2008 then I'll know, because this app is one of the few 3rd party "must have" utilities that I install whenever I rebuild my main machine.

Ok, that's that. Check it out - download a trial version and see if it rocks your world too. Especially with all of the Christmas photos you'll soon be taking, you know you want it...

Monday, December 15, 2008

Changes in SQL Server 2008

MCT Russ Loski recently shared this great link with the SQL Server trainer community:

http://msdn.microsoft.com/en-us/library/cc280407.aspx

This is the top-level "Backward Compatibility" topic from SQL Server 2008 Books Online, and includes sub-topics for the DRBMS, SSAS, SSIS, SSRS and replication, covering deprecated features, behavioral changes, breaking changes and more.

If you're looking into what's new and different (and what might bite you if you're not careful) with SQL Server 2008, this is a great place to start. Thanks, Russ!

Wednesday, December 10, 2008

"Bonus" Technical Deep Dive Session at the MCT Summit - Redmond and Prague

The vast majority of the content at the 2009 MCT Summit events is heavily weighted toward the IT Professional audience. This is largely due to the fact that the major developer and database product releases took place in 2007 and 2008, while there are significant releases in-flight for Windows and Exchange Server. But I believe there should still be some deep technical content for at least one underrepresented trainer audience: the BI developer.

So I'm going to fill in this gap by presenting an "off the schedule" technical deep dive on SQL Server Integration Services. I did something similar at the 2008 Redmond summit and it was very well received, so I'm going to model the 2009 Summit session on the same model. Here's the deal:

“Everything You Ever Wanted to Know About SQL Server Integration Services but Were Afraid Your Students Would Ask”

In this technical “deep dive” session, Matthew Roche will take attendees on a wild and sometimes horrifying ride into the dark underbelly of real world SSIS development that existing SSIS books and courseware doesn’t effectively cover, including development and deployment best practices, data flow internals and performance tuning and more. But be warned – there will be no fixed agenda for this session! The topics covered will be driven by attendee involvement, so the more questions you bring to the session the more everyone will get out of it. If you teach (or fear you may be asked to teach) the SSIS courses for SQL Server 2005 or SQL Server 2008, this is a session you don’t dare to miss.

Does this sound interesting to you?

If you are attending either event (Prague or Redmond) and would be willing to attend this session after the summit sessions end one day[1] then please reply here[2] to express your interest. If anyone (even one attendee) is interested then I will come prepared to spend as much time as necessary (I think we ran around three hours at the 2008 Summit in Redmond) to give each attendee everything he needs. So speak now or forever hold your data...

[1] This is a key point, as it will mean skipping on some other planned after-hours event, I'm sure.

[2] Or in the microsoft.private.mct.mctsummits newsgroup.

Friday, December 5, 2008

Connection Strings

This is just a little reminder for myself as much as for anyone else. There's a great resource online at http://www.connectionstrings.com/ that lists the syntax for hundreds of different types of data sources (SQL Server, Oracle, Excel, Active Directory, etc.) complete with examples and general rules/guidelines for constructing them. If you're ever in a situation where you need to build a connection string without a GUI to help you, I've never seen a better reference. Check it out.

Thursday, December 4, 2008

ThinkPad Keyboard Madness

I own lots of laptops. After being an independent software development and BI consultant for many years, I've assembled quite the collection. I have Dells and Toshibas[1] and have had Gateways and HPs as well. My little home office is filled with them[2].

I also have a Lenovo. It's a ThinkPad X300, and if I never needed to type on it, it would probably be one of my favorite computers. It's light, it's powerful, it's solid, and just about everything you could want a laptop to be.

But I do have to type on it. And because of this, I swear. A lot. In fact, I swear so much I just may need a new chart.

Why all the swearing, you ask? Well, before I answer, please take a look at your keyboard, down in the bottom left corner. Go on. I'll wait.

There.

You see, if you're typing on a ThinkPad, you'll probably[3] notice that the bottom left key is the control "Ctrl" key, and to the right of that (if you're using a laptop anyway) is the function "Fn" key.

Still with me? Good.

Now if you were horribly evil in a past life and are currently being punished by having to type on a ThinkPad, you'll instead notice that the Fn and Ctrl keys have been reversed. Swapped. Inverted.

And this is insane!

Ok, you ask - what's the big deal? Who cares?

First and foremost - me. I care a lot. You see, I'm a developer. And while I don't buy in with the whole developer stereotype[4] I definitely am a very keyboard-centric computer user. I prefer to use my keyboard whenever possible, and there are literally thousands of times each day when I perform some task that utilizes a Ctrl+Key keyboard shortcut. For example, you may be familiar with some of these well-loved productivity enhancers:

  • Ctrl-C = Copy
  • Ctrl-V = Paste
  • Ctrl-Arrow (Left or right) = Jump one word in text
  • Ctrl-Shift-Arrow (Left or right) = Jump and highlight one word in text
  • Ctrl-F4 = Close current tab or child window

I could go on and on and on. A significant portion of the most common Windows keyboard shortcuts involve the Ctrl key.

Now at this point you're probably asking "But Matthew, since this incarnation of ultimate evil that Lenovo uses for a keyboard inverts the Ctrl and Fn keys, what do these keyboard shortcuts do if your finger, trained through decades of computer use, hits the Fn key instead?"

Funny you should ask. This is what happens:

  • Fn-C = Nothing
  • Fn-V = Nada
  • Fn-Arrow (Left or right) = Zip
  • Fn-Shift-Arrow (Left or right) = Zilch
  • Fn-F4 = Puts the computer to sleep!!!!!

That's right. If you try to close out a child window and let your fingers follow the paths that are ingrained deep in their muscle memory, you have just effectively closed the lid on your accursed laptop and have shut it down. Even if you're in the middle of working. Even if you're in the middle of a giant download that then needs to be restarted. Even if you're connected remotely to the corporate network and you then need to dig out your smart card so you can re-connect. Even if you're hosting a LiveMeeting that your boss' boss' boss is attending. Yeah. Even then.

I could go on. I could rant about how there is no way to re-map these keys, because the Fn key isn't exposed to the standard Windows keyboard APIs. I could rave about how the Esc key is on its own row above the function keys (F1 through F12) so that when I try to hit F2 to rename a file I get F3 (search) and when I attempt to hit Esc I get F1 (help) instead. I could blow my top and go on and on about how when I use my other laptops (which outnumber my lone ThinkPad 7:1 at this point) the lessons that my poor fingers are being forced to learn are tripping me up when I use normal keyboards. I could go on and on and on...

But I won't. Because it's far too late for a rant like this to go on any further. Instead, I'm going to fill out a little survey that Lenovo has put together. And I encourage you to do the same. Please, please, please do the same.

I think it's great that Lenovo is conducting such a survey. But I think it is very frustrating that they're just getting around to asking. Take a few minutes and Live Search around on "ThinkPad" and "keyboard" and you'll see what I mean - dozens of other frustrated ThinkPad owners have poured their anguish into the ether as well. (My favorite one is here.)  And even more frustrating, the survey doesn't even mention the Fn and Ctrl key positions - it focuses on the Esc, Home, End, PgUp and PgDn keys instead. Ugh...

But hopefully they're listening now. This is what I told them at the end of my survey, in response to the question "Of the keys that vary between different key layouts, for which keys are you most concerned about their location (e.g. Esc, Home, End, PgUp, PgDn, others, etc.)?":

"The Ctrl key MUST be in the bottom-left corner, immediately below the Shift key. If this is not the case in the hardware then there MUST be a way to re-map these keys in the OS or BIOS. Yes, you need to keep your historic users happy, but you also need to understand the basic mechanics of using keyboard shortcuts and the PAIN that users feel when attempting to adjust to the ThinkPad's non-standard layout. Until one of these solutions is implemented I will never buy or use another ThinkPad."

And I won't. It pains me that much. There are lots of small, light, powerful laptops out there just waiting for my hardware budget, and only the ThinkPad comes with a keyboard that makes my typing life miserable every day. It's a show-stopper for me, and I will never buy another laptop without looking at the keyboard first.

But now it's time for me to sleep. And hopefully to dream of non-ThinkPad keyboards...

[1] Which is another story, but probably not one I'll rant about just yet...

[2] And, in the winter, heated by them.

[3] I saw "probably" simply because I have not owned one of every type of computer ever made, but for my own personal sample "probably" means 100% of the time.

[4] For example, I dislike Red Bull, don't drink Mountain Dew, and I'm tall, good looking[5] and very comfortable around members of the opposite sex. ;-)

[5] Humble, too.

Know Your Data... Werewolves

(That's right. Werewolves.)

Over a year ago, I posted a list of SSIS best practices, and I've presented on this topic a dozen or more times since then at conferences and user groups and seminars. One of these best practices was "Really know your data – really!" and the advice went something like this:

"If there is one lesson I've learned it is that source systems never behave the way you expect them to and behave as documented even less frequently. Question everything, and then test to validate the answers you retrieve. You need to understand not only the static nature of the data - what is stored where - but also the dynamic nature of the data - how it changes when it changes, and what processes initiate those changes, and when, and how, and why. Odds are you will never understand a complex source system well enough, so make sure you are very friendly (may I recommend including a line item for chocolate and/or alcohol in your project budget?) with the business domain experts for the systems from which you will be extracting data. Really."

Well, last night I stumbled across some old meeting notes that reminded me of why this is so very important. The notes were incredibly brief, and consisted largely of a quote from the client project owner. But before I share the quote, please let me share the context and story[1]...

I was helping to design and build a BI application that my client was then re-selling as a service to their customers.[2] We were loading in data from a variety of source systems that provided the same type of data, but which were implemented by different vendors. One of the systems, called FooBar[3] was consistently causing us problems. The data import was running without error, and everything looked good on the surface, but the FooBar-using customers were unhappy[4] because the data they were seeing through the BI portal was incorrect.

Yuck.

The client project owner had, years earlier, worked with the development group that built FooBar, and with his insight we were able to discover that occasionally the data export process in FooBar was failing silently, so that we were getting only partial exports, and there was no way to tell - no error was raised, no checksum existed, and because of some nasty internal details (which I won't go into here) the effort involved in updating FooBar would be very significant. It was not a good time.

So what's the quote? It was this:

"You guys planned for wolves. FooBar is werewolves."

Even after all this time, I still remember the deadpan delivery for this gem. And it was so true. We'd gone in thinking we were prepared for all of the usual problems - and we were. But we weren't prepared for the horrifying reality of the data problems that were lying in wait. We weren't prepared for werewolves.

So what's the moral of the story? Plan for werewolves. Assume the worst. Test early and often, and test failure scenarios, not just happy-day scenarios. Because the time to learn that there really are werewolves is when there is still time to pack a crossbow and some silver bolts, not when the full moon is rising.

Trust me.

 

[1] Names have been changed to protect the innocent. And the other people too...

[2] This has been a recurring theme for me, so it's probably not the company you're thinking of.

[3] I changed the names, remember?

[4] I may also use a literary tool called "understatement" in this story...

Choosing the "Right" SSIS Configuration Location

One of the questions I get asked quite often boils down to this:

"Should I store my SSIS configuration data in SQL Server or in XML configuration files?"

And my every-so-helpful answer usually is:

"Yes!"[1]

That's right - very accurate, but not very useful.

Of course, for this question isn't quite as simple as it might seem at first glance, and there really is no single answer that is both generally accurate (like mine is) and useful (which mine is not) at the same time. The truly useful answer is more complex, because it relies on the context of the question more than it relies on the question itself. There is no "best" or "correct" or "right" choice for configuration storage in general - it all depends on the packages, the project and the project team.

What does this mean? I like to summarize it something like this:

  • If you are using a file system deployment, it probably makes more sense to use XML configuration files.
  • If you are using a SQL Server deployment, it probably makes more sense to use SQL Server configurations.
  • If your ETL solution is managed by the application owner or server administrator, it probably makes more sense to use XML configuration files.
  • If your ETL solution is managed by the database administrator, it probably makes more sense to use SQL Server configurations.
  • If project team members and/or administrators have past experience and success with a given configuration type, it probably makes sense to use that type unless there is some compelling project-specific reason to do otherwise.

Notice that there's nothing in this list about the capabilities of the different configuration types. I've never found a problem that could be solved with one configuration type that could not also be solved with the other.

Of course, there are differences in the approaches that you use when selecting and implementing a configuration strategy for an SSIS application. And it's not always obvious how to get started, or where to go one you do get started. Fortunately, there are quite a few resources available to help make the right decision for you and to implement your configuration strategy correctly the first time. In particular, there is a small set of MSDN community articles[2] written by SSIS-focused SQL Server MVPs about SSIS configurations. Take a look at these:

The articles are listed in the order of "most likely valuable" so if you don't have time to read all four, you should read the top ones first.

Check it out, and good luck!

[1] I blame my mother for this. My childhood was filled with conversations that went like this: "Are we having chicken for dinner, or fish?" "Yes!" "Um..." ;-)

[2] There is an interesting story behind these articles. Shortly before SQL Server 2008 shipped, the SSIS team asked a bunch of MVPs if they would like to write SSIS articles for MSDN. There was no guidance or suggestions provided on what topics these articles should cover. But all but one of the MVPs who provided articles independently chose to write about some aspect of configurations. To me this reinforces my belief that configurations are both one of the most important and least understood features in all of SSIS.