Wednesday, July 20, 2011

Back in the SSISaddle

Testing…

Testing…

Is this thing on?

Although I still get the occasional comment on older blog posts, I assume that pretty much nobody subscribes to my blog anymore for SSIS information. When I joined Microsoft Learning in October 2008, the SQL/BI/SSIS side of my life got pushed to the side. I kept working with the tools, but it was no longer a core part of my job, and one of the things I largely gave up on blogging and related technical community efforts. It wasn’t a change I wanted to make, but I needed to ensure that I rocked my new job as much as possible. The end result was that over the last 30 months I’ve posted fewer than ten SSIS-related posts.[1]

This will now change.

This week I joined the SQL Server Integration Services team as a Senior Program Manager.[2]

In the months leading up to the SQL Server “Denali” release, I hope to blog regularly on “Denali”-related features, enhancements and techniques. The “Denali” release is a major release for SSIS, with a ton of exciting new capabilities for new users and seasoned SSIS professionals alike.[3] If you are using SSIS already, SSIS in “Denali” is going to make your life easier in many ways. If you’re just starting off with SSIS, the “Denali” release will ease your learning curve, and help you swear less. In short, I’ll have a lot to blog about.

With this said, it may still be a few weeks before I post again. Not only will I be getting up to speed on my new team and my new role, I will also be relocating my family from New York to Redmond.[4] The relocation will be consuming most or all of my free time for the next few weeks, but once I’m settled in you should expect to hear from me more often.

It’s good to be back.

Back in the SSISaddle indeed.

[1] And that I rocked the new job pretty darned hard, but that’s a longer story. Ask your friendly neighborhood Microsoft Certified Trainer.

[2] I wish you could see the smile on my face as I typed that sentence.

[3] If you want an overview of these new things, please be sure to download and view Matt Masson’s presentation from TechEd North America. We’ll have more announcements coming closer to release, but this session is more than enough to whet your appetite.

[4] I have two children, two cats and one wife, and the effort and stress of moving the whole family is staggering. I thought I knew what I was getting into, but boy was I wrong.

Wednesday, October 7, 2009

Transformations are NOT Tasks

This is another one of those “I thought I blogged this years ago” posts. I was having a conversation with a fellow MCT a few days ago, and he described SSIS data flow transformations as “tasks on the data flow tab.” I didn’t say anything then, but this phrase has been sticking in my mind ever since – not unlike a nasty splinter – and I think I need to get it written down before my brain lets me move on.

Why is this seemingly innocuous phrase causing me so much mental distress?

Because transformations are not tasks!

Let’s list a few characteristics of tasks in SSIS. Tasks:

  • Are used in the SSIS Control Flow.
  • Are Containers, with all of the properties (such as the ability to control transaction scope, or to be enabled or disabled) of containers.
  • Support property expressions.
  • Support breakpoints and debugging.
  • Can be individually configured for logging.
  • Are presented as boxes connected by arrows on the BIDS design surface.

The only characteristic that data flow transformations share with tasks is the last one: they’re presented as boxes, connected by arrows.

But the similarities end there. Transformations are metadata-bound functions that operate on data stored in the memory buffers managed by the data flow engine. They are subcomponents of the Data Flow task, which is just another task. It’s an important task, and it gets its own designer because of this, but it is just another task nonetheless.

Why is this important to me? Some (ok, most) may argue that I’m just being pedantic. And perhaps they’re right, to some degree. But any technical field has its own terminology that allows its practitioners to communicate efficiently and precisely. Hearing an SSIS developer use “task” for “transformation” is kind of like hearing a surgeon use “abdomen” for “stomach” or a .NET developer use “component” for “assembly.” The terms are related, but their meanings are very different. In each case it’s probably likely that the speaker knows the difference and is simply not speaking carefully, but sometimes (particularly with the surgeon ;-) you just want to be sure.

So the next time someone tells you that he’s using “the Lookup Task” please send him here.

P.S. I should also be sure to mention that the MCT who used this offending phrase probably does not need this rant to let him know what’s going on. MCTs tend to have a solid grasp of the technologies they teach; you tend to crash in burn in front of the classroom if this is not the case. But I knew that I was going to have this stuck in my mind forever if I didn’t say something somewhere…

Tuesday, September 29, 2009

SQL Server MVP Deep Dives

I’m not a SQL Server Most Valuable Professional anymore[1], but I was when this amazing project got underway, and I am proud to contribute as an author on SQL Server Integration Services. SQL Server MVP Deep Dives is a unique book. It has:

  • 53 authors
  • 59 chapters
  • Coverage of database design, development, administration and performance tuning
  • Coverage of the SQL Server BI suite: SSIS, SSAS and SSRS

But that’s not the coolest thing about this book. The coolest thing is where the money goes. All author royalties (that’s right – 100%) go to War Child International, a network of independent organizations, working across the world to help children affected by war.[2] So not only will this book help you become a better SQL Server practitioner, it will also help children around the world.

You can pre-order the book online today at the SQL Server MVP Deep Dives web site, and get immediate access to a pre-release PDF copy of the book. (Note: if you order before the end of September and use the code pop0928 at the checkout you will also save 50%!) Also, if you’re attending the PASS Summit in Seattle in early November, you can buy a copy at the conference bookstore and it signed by many of the authors. I’ll be there, and based on the email I’ve seen we’ll see a majority of the authors and editors should be there as well.

If you only buy one SQL Server book this year, why not make it this book? And since all proceeds go to benefit a great cause, why not buy two?

I’ll see you at PASS!

[1] You can’t be an MVP and a Microsoft employee at the same time, so when I joined Microsoft they took away my MVP status. Which was OK.

[2] Not, as some may think, a non-profit organization dedicated to helping aging Jethro Tull fans.

Monday, September 21, 2009

SSIS Package Templates: The Good, The Bad and The Ugly.

This is one of those “I could have sworn I blogged on this topic years ago” posts. I speak to this topic whenever I’m presenting on SSIS development best practices, and with my half-day mini-seminar in Stockholm last week[1] I was thinking of it – and speaking about it - again. Although SSIS is a great platform and provides pretty much all the tools you need to get the job done, sometimes it’s not always easy to know what tools to use and how to use them.

Which brings us, of course, to package templates. So before we continue, please read this Knowledge Base article on how to create a package template in BIDS.

Done? Good – let’s review, and paraphrase while we do it.

Package templates are pretty much the entire reuse story in SSIS.[2] You define the baseline package, including connection managers, log providers, configurations, variables and expressions, tasks, containers and precedence constraints and sometimes data flow components as well. When you create a new package you create a new copy of the template with this pre-defined starting point instead of starting with a blank package, so you can be sure that the package is just like the baseline you created.

In keeping with the theme established in the post title, this is the “good” side of SSIS package templates. They’re valuable, and when you’re building a large number of packages for the ETL subsystem of a DW/BI project, they’re invaluable, in that the common “plumbing” required by all packages can be defined once and used consistently everywhere. And this is particularly vital when working in a team environment, where different developers (each of whom inherently has his own style and own way of building packages) are working on the same project, which is typically the case on ETL projects of anything but the most trivial in scale.

Which brings us to the “bad” side of package templates, at least as implemented when following the guidance in the KB article referenced above. This guidance is firmly targeted at the lone developer; did you notice how all the instructions about creating a package and saving it in the Visual Studio install folder never mentioned on what computer you should be performing these steps? That’s because the article assumes the context is “your machine” – the lone developer’s machine or - at scale – on every developer’s machine.

Why is this bad, you ask? Surely it can’t be so much work to copy a DTSX file to the same folder on a few computers, can it?

Of course not. The “bad” side of following this approach comes not when creating the templates, but when the real world comes into play and the project continues to grow and evolve. In a perfect world, when the needs of the project are understood 100% before development begins, and the needs of the project never change after development begins, this approach will work just fine for team development. Your mileage may vary, but I’ve yet to work an SSIS project[3] like this.

In the real world, most successful BI projects are iterative and agile, just like most other successful development projects. If you look at SQL Server MVP John Welch or MVP and VSTS Ranger Ted Malone, their blogs focus strongly on agile BI. If you read Kimball, you will see an overarching theme of agile and iterative development. And if you’ve ever seen former SSIS program manager Michael Entin present on SSIS performance tuning[4] you’ve noticed this best practice under the heading of “Principles of Good Package Design”: “Follow Microsoft Development Guidelines - Iterative design, development & testing.”

Ok, so why did I just go off on that tangent and drop so many names? Regular readers will quickly note that my writing style practically revolves around tangents and will likely seize on this as the reason, but today it is to drive home a point: The real world is neither pretty nor neat, and people who know more about BI than I will ever forget[5] have gone out of their way to point this out.

Why is this bad, you ask yet again? Surely it can’t be so much work to copy a DTSX file to the same folder on a few computers, can it? And can you please just get to the point, Matthew?

Maybe. Picture this story:

A small team of five or six SSIS developers are working together on a project. The lead developer has created a package template for all developers on the team to use moving forward. He emails the DTSX file to each team member along with a reference to KB article 908018 for instructions on how to install and use the template.

A week later the lead developer identifies a problem with the package template.[6] He updates the template package and emails it out to his team with instructions to update their local machines with the new package right away.

Unfortunately, one of the team members is a “heads-down” kind of developer who turns off email during the day, and finishes a few more packages before he gets the message, even though he does save the new template package once he checks his email.

And another developer is on vacation when the email is sent, and by the time he gets back he has too many new messages in his inbox to ever get through, so he decides that if anything is really important then the sender will re-send it, and he just deletes anything that is unread to get a fresh start with no email backlog.[7] So he keeps on developing packages using the outdated template until the next update is sent out a few weeks later…

You see where this is going, right? This is gong to the “ugly” side of SSIS package templates. The ugly part comes in downstream when integration testing begins. The packages developed by the individual team members need to be deployed into a test environment and tested together, and of course at that point things start to fall apart. Everyone works extra hours and deadlines slip by while people try to track down problems they knew had been solved long ago. Although to be honest, the worst case scenario is that the test cases don’t find the problem at this point, and the ETL solution is deployed into production without the errors being resolved.

Unfortunately, I have seen this story play out in one way or another on several different projects over the years. The details are always different, but the ending is always the same: extra work, unhappy clients, and pain.

Now picture this story retold, with a few different details:

The lead developer works with his team to establish Team Foundation Server (TFS) source code control[8] before any development begins. He develops a set of packages that serve as “template examples” and checks them into TFS.

He also establishes standards with his team, so that each team member knows to Get Latest whenever starting work on a new package. This ensures that – no matter what happens – each developer always has the most recent versions of all packages with no need for “special case” tasks like updating a template.

The lead developer also instructs each development team member to create any new packages by right-clicking in Solution Explorer and selecting Add Existing Package, and then browsing to a current file from TFS.[9] This ensures that whenever a new package is created, it is being created as a new instance of a “rolling template” that is stored in TFS and as such the same version is always available for use by all developers.

Then, whenever problems are identified in existing packages, the “problem packages” are updated immediately, and the fixes are applied to any existing packages created from a now-outdated template.[10]

And whenever anyone goes on vacation, problems are found earlier, their impact is lessened, deadlines are met, and everyone lives happily ever after.

Ok, so the results are probably not so extreme, but in my experience on multiple projects this “rolling template” approach is the best practice way to approach SSIS package templates. Any attempt to utilize the Microsoft guidance in a team environment runs into problem after problem, and at the worst extreme I have seen a development team manually re-implement dozens of SSIS packages using a new template after churning for a full week while trying to reconcile and resolve the problems disseminated by using the “official” template approach.

And to be fair, the definition and implementation of policies and procedures can help make following the “official” approach less bad, but the starting point is inherently flawed for team development. If anyone were to suggest to me that following KB 908018 was the right way to approach package templates, I would have to politely[11] disagree.

What do you think? Even though I have been around the block a few times with SSIS, I don’t claim to know it all. I’d love to hear your thoughts on how you use package templates – especially if I took your name in vain earlier on - including what works and what does not work in your team environment.

 

[1] I never seem to find the time to do tourist stuff when traveling for work, but for some reason I just could not say no when I was invited to speak on SSIS best practices when in Sweden for a friend’s wedding. I have the world’s most understanding wife.

[2] I say “pretty much” because using small, modular packages and executing them via the Execute Package task or some other client, or using the SSIS .NET API to dynamically construct and execute packages are the other parts. In SQL Server 2008 and SQL Server 2008 there is not a lot to say about reuse in SSIS.

[3] Or projects using any other technology, for that matter. If you know of any projects like this and are looking for a consultant, I’m pretty sure I can work out a moonlighting arrangement with Microsoft. ;-)

[4] Or have seen me blatantly plagiarize his slide decks for my own talks.

[5] No Ted, this doesn’t include you. Sorry.

[6] I won’t go into the details of the problem because it isn’t really important to the story, but it could be anything along the lines of:

  a) A missing variable, which needed to be added.
  b) A variable defined at an incorrect scope, hiding a variable at the package scope when accessed from a Script task.
  c) A log provider needed to be added to support new project requirements.

You get the idea, right?

[7] Yes, people really do this. Yes, I’m thinking about doing it right now. I’ve been on vacation for a week and have been checking email at least once a day to respond to high priority items, but I still have nearly 700 unread messages in my inbox. ;-(

[8] Or Visual SourceSafe or pretty much any modern SCC system. But TFS has so many additional benefits, it’s hard to recommend anything else.

[9] Just to be clear, I’m not implying that there is any ability to browse TFS repositories in BIDS, but instead am saying that the developers all have the most recent packages from TFS and can browse to them on their local file systems.

[10] Odds are they use a tool like PacMan to batch update the existing packages. ;-)

[11] Or perhaps not so politely depending on the hour…

Friday, May 22, 2009

Coolest. Tool. Ever.

I’m struggling with how to best start this post. Here are a few possibilities:

  • I like to think that I’m a hard person to impress. But…
  • I swear at my computer many times a day, but usually in anger and almost never in delighted surprise.
  • Someone deserves a raise.
  • Holy crap!

All of them apply, so you can take your pick.

But what am I talking about?

I’m talking about the DocumentReflector tool that’s included with the Open XML Format SDK v2.0.

You see, I’ve been playing today with a few Office development tasks today for a work-related project. I’ve never looked too closely at working with the Office Open XML file format before this week; I’ve read white papers and such, but until you start writing your own code you never really “get” what an API is all about. After a few false starts and many searches, I was less than impressed.

But then I discovered DocumentReflector. This is a stand-alone SDK tool that will (wait for it) let you open an Office Open XML document, select a subset of the document or the entire document and it (here it comes) displays the C# code that will create it.

Holy crap indeed.

I’m sure I’ll be swearing at this amazing tool before the weekend is over (the new wears off pretty quickly for me) but today whoever built this app is officially my hero.

Edit: I almost forgot to add – there’s a great overview of using this tool on Erika Ehrli’s blog, here.

Now I can’t help but ask why we don’t have a tool like this for SSIS packages. Wouldn’t it be great to point a utility at a DTSX file and have it output the code required to build that package through the SSIS API? ;-)

Tuesday, May 19, 2009

Spatial Demo Goodness

If you’re someone who presents on SQL Server topics, you have probably run into something of a wall when it comes to getting interesting spatial data sets to demonstrate. The spatial data included with the SQL Server 2008 sample databases is functional, but not particularly complex or interesting. There is free spatial data available for many different sources online, but it tends to be difficult to find, in different formats, and annoyingly difficult to load into SQL Server.[1] And of course, the not-free spatial content out there tends to be really, really not-free, and while it may make sense to pay a premium price if you are developing premium software, but for demo purposes this is generally a non-starter.

Enter GeoNames.

GeoNames is an open source provider of spatial data. Essentially they have many disparate sources of free spatial content and have aggregated them into a single location, with many different access methods. They support web service access (and publish a nice set of client libraries too) which is nice for direct application integration, but to me the cool factor comes from the ability to download text dumps of the whole database or just the countries you want. Because then you can load the data into SQL Server 2008 and let the demo goodness begin.

And Ed Katibah, PM for the SQL Server spatial team, has posted instructions for loading GeoNames data into SQL Server 2008. It’s great to have these steps documented because there are quite a few of them, but hopefully you’ll only need to perform them once.

So if you have been waiting for great spatial data that’s available for free, wait no longer.

I should also point out that I became aware of this cool resource not based on my own hard work and research, but instead because of the excellent Simple Talk newsletter that Red Gate Software produces. And I should probably mention that the primary reason I blogged about it is that my friend and colleague, senior SQL Server trainer and all-around good guy Chris Randall has been working on building better spatial demo sets, and I’ve heard rumors that  he occasionally reads this blog. Hopefully this will save him (and you, and me) some work…

 

[1] Please keep in mind that I don’t claim to be a SQL Server spatial expert, so what is “annoyingly difficult” for me may be “exceptionally simple” for someone with more experience, but it likely to be “annoyingly difficult” for many people.

Thursday, April 9, 2009

More on Package Configurations

Douglas Laudenschlager from the SSIS documentation team has posted an excellent article to his unofficial SSIS blog on how package configurations are applied in SQL Server 2005 and SQL Server 2008 Integration services. Check it out here.

I don’t often post on my blog simply to reference someone else’s original work, but I wanted to do it today because:

  • Package configurations are a vitally important feature that are often misunderstood.
  • The manner in which configurations are applied has changed between SQL Server 2005 and SQL Server 2008.
  • These changes can cause changes in package behavior when executing packages with specific DTEXEC options

So if you are moving from SQL Server 2005 to SQL Server 2008, definitely be sure to read about these changes.