Thursday, December 4, 2008

Know Your Data... Werewolves

(That's right. Werewolves.)

Over a year ago, I posted a list of SSIS best practices, and I've presented on this topic a dozen or more times since then at conferences and user groups and seminars. One of these best practices was "Really know your data – really!" and the advice went something like this:

"If there is one lesson I've learned it is that source systems never behave the way you expect them to and behave as documented even less frequently. Question everything, and then test to validate the answers you retrieve. You need to understand not only the static nature of the data - what is stored where - but also the dynamic nature of the data - how it changes when it changes, and what processes initiate those changes, and when, and how, and why. Odds are you will never understand a complex source system well enough, so make sure you are very friendly (may I recommend including a line item for chocolate and/or alcohol in your project budget?) with the business domain experts for the systems from which you will be extracting data. Really."

Well, last night I stumbled across some old meeting notes that reminded me of why this is so very important. The notes were incredibly brief, and consisted largely of a quote from the client project owner. But before I share the quote, please let me share the context and story[1]...

I was helping to design and build a BI application that my client was then re-selling as a service to their customers.[2] We were loading in data from a variety of source systems that provided the same type of data, but which were implemented by different vendors. One of the systems, called FooBar[3] was consistently causing us problems. The data import was running without error, and everything looked good on the surface, but the FooBar-using customers were unhappy[4] because the data they were seeing through the BI portal was incorrect.


The client project owner had, years earlier, worked with the development group that built FooBar, and with his insight we were able to discover that occasionally the data export process in FooBar was failing silently, so that we were getting only partial exports, and there was no way to tell - no error was raised, no checksum existed, and because of some nasty internal details (which I won't go into here) the effort involved in updating FooBar would be very significant. It was not a good time.

So what's the quote? It was this:

"You guys planned for wolves. FooBar is werewolves."

Even after all this time, I still remember the deadpan delivery for this gem. And it was so true. We'd gone in thinking we were prepared for all of the usual problems - and we were. But we weren't prepared for the horrifying reality of the data problems that were lying in wait. We weren't prepared for werewolves.

So what's the moral of the story? Plan for werewolves. Assume the worst. Test early and often, and test failure scenarios, not just happy-day scenarios. Because the time to learn that there really are werewolves is when there is still time to pack a crossbow and some silver bolts, not when the full moon is rising.

Trust me.


[1] Names have been changed to protect the innocent. And the other people too...

[2] This has been a recurring theme for me, so it's probably not the company you're thinking of.

[3] I changed the names, remember?

[4] I may also use a literary tool called "understatement" in this story...

1 comment:

atlas245 said...
This comment has been removed by a blog administrator.