BI Polar: 2009

Wednesday, October 7, 2009

Transformations are NOT Tasks

This is another one of those “I thought I blogged this years ago” posts. I was having a conversation with a fellow MCT a few days ago, and he described SSIS data flow transformations as “tasks on the data flow tab.” I didn’t say anything then, but this phrase has been sticking in my mind ever since – not unlike a nasty splinter – and I think I need to get it written down before my brain lets me move on.

Why is this seemingly innocuous phrase causing me so much mental distress?

Because transformations are not tasks!

Let’s list a few characteristics of tasks in SSIS. Tasks:

Are used in the SSIS Control Flow.
Are Containers, with all of the properties (such as the ability to control transaction scope, or to be enabled or disabled) of containers.
Support property expressions.
Support breakpoints and debugging.
Can be individually configured for logging.
Are presented as boxes connected by arrows on the BIDS design surface.

The only characteristic that data flow transformations share with tasks is the last one: they’re presented as boxes, connected by arrows.

But the similarities end there. Transformations are metadata-bound functions that operate on data stored in the memory buffers managed by the data flow engine. They are subcomponents of the Data Flow task, which is just another task. It’s an important task, and it gets its own designer because of this, but it is just another task nonetheless.

Why is this important to me? Some (ok, most) may argue that I’m just being pedantic. And perhaps they’re right, to some degree. But any technical field has its own terminology that allows its practitioners to communicate efficiently and precisely. Hearing an SSIS developer use “task” for “transformation” is kind of like hearing a surgeon use “abdomen” for “stomach” or a .NET developer use “component” for “assembly.” The terms are related, but their meanings are very different. In each case it’s probably likely that the speaker knows the difference and is simply not speaking carefully, but sometimes (particularly with the surgeon ;-) you just want to be sure.

So the next time someone tells you that he’s using “the Lookup Task” please send him here.

P.S. I should also be sure to mention that the MCT who used this offending phrase probably does not need this rant to let him know what’s going on. MCTs tend to have a solid grasp of the technologies they teach; you tend to crash in burn in front of the classroom if this is not the case. But I knew that I was going to have this stuck in my mind forever if I didn’t say something somewhere…

Tuesday, September 29, 2009

SQL Server MVP Deep Dives

I’m not a SQL Server Most Valuable Professional anymore[1], but I was when this amazing project got underway, and I am proud to contribute as an author on SQL Server Integration Services. SQL Server MVP Deep Dives is a unique book. It has:

53 authors
59 chapters
Coverage of database design, development, administration and performance tuning
Coverage of the SQL Server BI suite: SSIS, SSAS and SSRS

But that’s not the coolest thing about this book. The coolest thing is where the money goes. All author royalties (that’s right – 100%) go to War Child International, a network of independent organizations, working across the world to help children affected by war.[2] So not only will this book help you become a better SQL Server practitioner, it will also help children around the world.

You can pre-order the book online today at the SQL Server MVP Deep Dives web site, and get immediate access to a pre-release PDF copy of the book. (Note: if you order before the end of September and use the code pop0928 at the checkout you will also save 50%!) Also, if you’re attending the PASS Summit in Seattle in early November, you can buy a copy at the conference bookstore and it signed by many of the authors. I’ll be there, and based on the email I’ve seen we’ll see a majority of the authors and editors should be there as well.

If you only buy one SQL Server book this year, why not make it this book? And since all proceeds go to benefit a great cause, why not buy two?

I’ll see you at PASS!

[1] You can’t be an MVP and a Microsoft employee at the same time, so when I joined Microsoft they took away my MVP status. Which was OK.

[2] Not, as some may think, a non-profit organization dedicated to helping aging Jethro Tull fans.

Monday, September 21, 2009

SSIS Package Templates: The Good, The Bad and The Ugly.

This is one of those “I could have sworn I blogged on this topic years ago” posts. I speak to this topic whenever I’m presenting on SSIS development best practices, and with my half-day mini-seminar in Stockholm last week[1] I was thinking of it – and speaking about it - again. Although SSIS is a great platform and provides pretty much all the tools you need to get the job done, sometimes it’s not always easy to know what tools to use and how to use them.

Which brings us, of course, to package templates. So before we continue, please read this Knowledge Base article on how to create a package template in BIDS.

Done? Good – let’s review, and paraphrase while we do it.

Package templates are pretty much the entire reuse story in SSIS.[2] You define the baseline package, including connection managers, log providers, configurations, variables and expressions, tasks, containers and precedence constraints and sometimes data flow components as well. When you create a new package you create a new copy of the template with this pre-defined starting point instead of starting with a blank package, so you can be sure that the package is just like the baseline you created.

In keeping with the theme established in the post title, this is the “good” side of SSIS package templates. They’re valuable, and when you’re building a large number of packages for the ETL subsystem of a DW/BI project, they’re invaluable, in that the common “plumbing” required by all packages can be defined once and used consistently everywhere. And this is particularly vital when working in a team environment, where different developers (each of whom inherently has his own style and own way of building packages) are working on the same project, which is typically the case on ETL projects of anything but the most trivial in scale.

Which brings us to the “bad” side of package templates, at least as implemented when following the guidance in the KB article referenced above. This guidance is firmly targeted at the lone developer; did you notice how all the instructions about creating a package and saving it in the Visual Studio install folder never mentioned on what computer you should be performing these steps? That’s because the article assumes the context is “your machine” – the lone developer’s machine or - at scale – on every developer’s machine.

Why is this bad, you ask? Surely it can’t be so much work to copy a DTSX file to the same folder on a few computers, can it?

Of course not. The “bad” side of following this approach comes not when creating the templates, but when the real world comes into play and the project continues to grow and evolve. In a perfect world, when the needs of the project are understood 100% before development begins, and the needs of the project never change after development begins, this approach will work just fine for team development. Your mileage may vary, but I’ve yet to work an SSIS project[3] like this.

In the real world, most successful BI projects are iterative and agile, just like most other successful development projects. If you look at SQL Server MVP John Welch or MVP and VSTS Ranger Ted Malone, their blogs focus strongly on agile BI. If you read Kimball, you will see an overarching theme of agile and iterative development. And if you’ve ever seen former SSIS program manager Michael Entin present on SSIS performance tuning[4] you’ve noticed this best practice under the heading of “Principles of Good Package Design”: “Follow Microsoft Development Guidelines - Iterative design, development & testing.”

Ok, so why did I just go off on that tangent and drop so many names? Regular readers will quickly note that my writing style practically revolves around tangents and will likely seize on this as the reason, but today it is to drive home a point: The real world is neither pretty nor neat, and people who know more about BI than I will ever forget[5] have gone out of their way to point this out.

Why is this bad, you ask yet again? Surely it can’t be so much work to copy a DTSX file to the same folder on a few computers, can it? And can you please just get to the point, Matthew?

Maybe. Picture this story:

A small team of five or six SSIS developers are working together on a project. The lead developer has created a package template for all developers on the team to use moving forward. He emails the DTSX file to each team member along with a reference to KB article 908018 for instructions on how to install and use the template.

A week later the lead developer identifies a problem with the package template.[6] He updates the template package and emails it out to his team with instructions to update their local machines with the new package right away.

Unfortunately, one of the team members is a “heads-down” kind of developer who turns off email during the day, and finishes a few more packages before he gets the message, even though he does save the new template package once he checks his email.

And another developer is on vacation when the email is sent, and by the time he gets back he has too many new messages in his inbox to ever get through, so he decides that if anything is really important then the sender will re-send it, and he just deletes anything that is unread to get a fresh start with no email backlog.[7] So he keeps on developing packages using the outdated template until the next update is sent out a few weeks later…

You see where this is going, right? This is gong to the “ugly” side of SSIS package templates. The ugly part comes in downstream when integration testing begins. The packages developed by the individual team members need to be deployed into a test environment and tested together, and of course at that point things start to fall apart. Everyone works extra hours and deadlines slip by while people try to track down problems they knew had been solved long ago. Although to be honest, the worst case scenario is that the test cases don’t find the problem at this point, and the ETL solution is deployed into production without the errors being resolved.

Unfortunately, I have seen this story play out in one way or another on several different projects over the years. The details are always different, but the ending is always the same: extra work, unhappy clients, and pain.

Now picture this story retold, with a few different details:

The lead developer works with his team to establish Team Foundation Server (TFS) source code control[8] before any development begins. He develops a set of packages that serve as “template examples” and checks them into TFS.

He also establishes standards with his team, so that each team member knows to Get Latest whenever starting work on a new package. This ensures that – no matter what happens – each developer always has the most recent versions of all packages with no need for “special case” tasks like updating a template.

The lead developer also instructs each development team member to create any new packages by right-clicking in Solution Explorer and selecting Add Existing Package, and then browsing to a current file from TFS.[9] This ensures that whenever a new package is created, it is being created as a new instance of a “rolling template” that is stored in TFS and as such the same version is always available for use by all developers.

Then, whenever problems are identified in existing packages, the “problem packages” are updated immediately, and the fixes are applied to any existing packages created from a now-outdated template.[10]

And whenever anyone goes on vacation, problems are found earlier, their impact is lessened, deadlines are met, and everyone lives happily ever after.

Ok, so the results are probably not so extreme, but in my experience on multiple projects this “rolling template” approach is the best practice way to approach SSIS package templates. Any attempt to utilize the Microsoft guidance in a team environment runs into problem after problem, and at the worst extreme I have seen a development team manually re-implement dozens of SSIS packages using a new template after churning for a full week while trying to reconcile and resolve the problems disseminated by using the “official” template approach.

And to be fair, the definition and implementation of policies and procedures can help make following the “official” approach less bad, but the starting point is inherently flawed for team development. If anyone were to suggest to me that following KB 908018 was the right way to approach package templates, I would have to politely[11] disagree.

What do you think? Even though I have been around the block a few times with SSIS, I don’t claim to know it all. I’d love to hear your thoughts on how you use package templates – especially if I took your name in vain earlier on - including what works and what does not work in your team environment.

[1] I never seem to find the time to do tourist stuff when traveling for work, but for some reason I just could not say no when I was invited to speak on SSIS best practices when in Sweden for a friend’s wedding. I have the world’s most understanding wife.

[2] I say “pretty much” because using small, modular packages and executing them via the Execute Package task or some other client, or using the SSIS .NET API to dynamically construct and execute packages are the other parts. In SQL Server 2008 and SQL Server 2008 there is not a lot to say about reuse in SSIS.

[3] Or projects using any other technology, for that matter. If you know of any projects like this and are looking for a consultant, I’m pretty sure I can work out a moonlighting arrangement with Microsoft. ;-)

[4] Or have seen me blatantly plagiarize his slide decks for my own talks.

[5] No Ted, this doesn’t include you. Sorry.

[6] I won’t go into the details of the problem because it isn’t really important to the story, but it could be anything along the lines of:

a) A missing variable, which needed to be added.
b) A variable defined at an incorrect scope, hiding a variable at the package scope when accessed from a Script task.
c) A log provider needed to be added to support new project requirements.

You get the idea, right?

[7] Yes, people really do this. Yes, I’m thinking about doing it right now. I’ve been on vacation for a week and have been checking email at least once a day to respond to high priority items, but I still have nearly 700 unread messages in my inbox. ;-(

[8] Or Visual SourceSafe or pretty much any modern SCC system. But TFS has so many additional benefits, it’s hard to recommend anything else.

[9] Just to be clear, I’m not implying that there is any ability to browse TFS repositories in BIDS, but instead am saying that the developers all have the most recent packages from TFS and can browse to them on their local file systems.

[10] Odds are they use a tool like PacMan to batch update the existing packages. ;-)

[11] Or perhaps not so politely depending on the hour…

Friday, May 22, 2009

Coolest. Tool. Ever.

I’m struggling with how to best start this post. Here are a few possibilities:

I like to think that I’m a hard person to impress. But…
I swear at my computer many times a day, but usually in anger and almost never in delighted surprise.
Someone deserves a raise.
Holy crap!

All of them apply, so you can take your pick.

But what am I talking about?

I’m talking about the DocumentReflector tool that’s included with the Open XML Format SDK v2.0.

You see, I’ve been playing today with a few Office development tasks today for a work-related project. I’ve never looked too closely at working with the Office Open XML file format before this week; I’ve read white papers and such, but until you start writing your own code you never really “get” what an API is all about. After a few false starts and many searches, I was less than impressed.

But then I discovered DocumentReflector. This is a stand-alone SDK tool that will (wait for it) let you open an Office Open XML document, select a subset of the document or the entire document and it (here it comes) displays the C# code that will create it.

Holy crap indeed.

I’m sure I’ll be swearing at this amazing tool before the weekend is over (the new wears off pretty quickly for me) but today whoever built this app is officially my hero.

Edit: I almost forgot to add – there’s a great overview of using this tool on Erika Ehrli’s blog, here.

Now I can’t help but ask why we don’t have a tool like this for SSIS packages. Wouldn’t it be great to point a utility at a DTSX file and have it output the code required to build that package through the SSIS API? ;-)

Tuesday, May 19, 2009

Spatial Demo Goodness

If you’re someone who presents on SQL Server topics, you have probably run into something of a wall when it comes to getting interesting spatial data sets to demonstrate. The spatial data included with the SQL Server 2008 sample databases is functional, but not particularly complex or interesting. There is free spatial data available for many different sources online, but it tends to be difficult to find, in different formats, and annoyingly difficult to load into SQL Server.[1] And of course, the not-free spatial content out there tends to be really, really not-free, and while it may make sense to pay a premium price if you are developing premium software, but for demo purposes this is generally a non-starter.

Enter GeoNames.

GeoNames is an open source provider of spatial data. Essentially they have many disparate sources of free spatial content and have aggregated them into a single location, with many different access methods. They support web service access (and publish a nice set of client libraries too) which is nice for direct application integration, but to me the cool factor comes from the ability to download text dumps of the whole database or just the countries you want. Because then you can load the data into SQL Server 2008 and let the demo goodness begin.

And Ed Katibah, PM for the SQL Server spatial team, has posted instructions for loading GeoNames data into SQL Server 2008. It’s great to have these steps documented because there are quite a few of them, but hopefully you’ll only need to perform them once.

So if you have been waiting for great spatial data that’s available for free, wait no longer.

I should also point out that I became aware of this cool resource not based on my own hard work and research, but instead because of the excellent Simple Talk newsletter that Red Gate Software produces. And I should probably mention that the primary reason I blogged about it is that my friend and colleague, senior SQL Server trainer and all-around good guy Chris Randall has been working on building better spatial demo sets, and I’ve heard rumors that he occasionally reads this blog. Hopefully this will save him (and you, and me) some work…

[1] Please keep in mind that I don’t claim to be a SQL Server spatial expert, so what is “annoyingly difficult” for me may be “exceptionally simple” for someone with more experience, but it likely to be “annoyingly difficult” for many people.

Thursday, April 9, 2009

More on Package Configurations

Douglas Laudenschlager from the SSIS documentation team has posted an excellent article to his unofficial SSIS blog on how package configurations are applied in SQL Server 2005 and SQL Server 2008 Integration services. Check it out here.

I don’t often post on my blog simply to reference someone else’s original work, but I wanted to do it today because:

Package configurations are a vitally important feature that are often misunderstood.
The manner in which configurations are applied has changed between SQL Server 2005 and SQL Server 2008.
These changes can cause changes in package behavior when executing packages with specific DTEXEC options

So if you are moving from SQL Server 2005 to SQL Server 2008, definitely be sure to read about these changes.

Sunday, April 5, 2009

Visual Studio Window Layout for SSIS Development

My friend and colleague Lynn Langit has a habit of blogging where she posts an image and a brief question and that’s it. I’m very jealous, because I can’t seem to write a blog post that’s shorter than 20 pages or so.

So in this post I’m trying to be More Like Lynn.

This is how I configure my Visual Studio development environment for SSIS development:

(Click on the thumbnail for a larger image.)

How do you configure your windows?

Ok, so now if I were More Like Lynn, that would be the whole post. But since I haven’t had a mango martini in far too long, I feel compelled to elaborate:

I dock the Variables window in the upper left corner in its own tab group.
I dock the Toolbox window in the lower left corner in a different tab group.
(This lets me see both the Variables and the Toolbox at the same time during package development, which is often very useful.)
I dock the Log Events window in the same tab group as the Toolbox, because you generally don’t use them both at the same time.
I dock the Solution Explorer window in the upper right corner.
I dock the Team Explorer window in the same tab group as Solution Explorer.
I dock the Properties window in the lower right corner in its own tab group.
I dock the Error List window on the bottom.
I dock the Output window on the bottom in the same tab group.
I dock the Pending Changes window on the bottom in the same tab group as the Error List and the Output windows.
I leave all of the designers in the middle, since this is where they always go. :-)

This layout gives me ready access to all of the tools I need, when I need them. And I would still love to know how you lay out your windows in your SSIS development environment…

SSIS and the Package ProtectionLevel Property

This is another one of those “I could have sworn I blogged on this years ago” posts.[1] I was looking for something I’d already written that discussed the ProtectionLevel property of the SSIS Package, but search as I might I could not find it. So I guess it’s time to write it, eh?

(As a note for the lazy reader, the moral of the story is that you should use the DontSaveSensitive ProtectionLevel for your packages, and use package configurations to apply the sensitive property values to your packages. Feel free to read the next five pages, but it all boils down to this.)

First of all, let’s start off by reviewing the property itself. Take a look at the properties of any SSIS package and you’ll see it there:

Notice the description there at the bottom of the Properties window? “Specifies the protection mode and the protection method of the package.” Hey – that’s pretty self-explanatory, so maybe we don’t need a blog post after all. Right?

Maybe not.

Before we continue, why not take a look at what SQL Server Books Online (BOL) has to say about this property?

Ok, so you’ve read the docs. Now let’s review:

Component developers get to specify which properties of their components are considered sensitive.
For the built-in components, this includes the password property of connection managers, some system variables and the XML nodes in the underlying DTSX file that store this information.
SSIS users (that means us, the package developers) cannot control what is considered sensitive.

Why does this matter?

Well, SSIS is designed to be secure. One aspect of this is that any “sensitive” information that is part of your package (think about it – many packages include information like passwords in connection strings for connecting to databases and legacy systems where valuable data is stored) needs to be protected. And by protected, we mean never, ever stored in clear text.

That’s right, donkey – never, ever.[3]

But wait, you say – I’ve built many SSIS packages and I’ve never been prompted to encrypt any sensitive anything. How am I supposed to believe you?

Good question. Like most developers[4], the fine members of the SSIS product group wanted to avoid irritating their users.[5] Because of this, they wanted to make protecting sensitive data seamless, even transparent. So they did. Take another look at that screen shot of the Properties window above. Do you notice the EncryptSensitiveWithUserKey option? That’s the default.

This means that, by default, SSIS uses the Data Protection API (DPAPI) to encrypt any sensitive data with the user key of the user who saves the package. Every Windows user has a user key, so this is pretty straightforward. Then, when the user goes to open the package again, the encrypted data is automatically decrypted without the user needing to do anything. Pretty slick, eh?

Yep. Pretty slick. Especially for SSIS developers who live in a world where they never work in teams and never have their packages run in an unattended batch job that uses a service account.

Unfortunately, I don’t live in that world, and neither do many of the SSIS developers I know.

Ok, let’s be fair. That sounded awfully critical, but it wasn’t intended to be. I generally don’t like to criticize unless I can suggest a better alternative, and in this context I don’t see a more reasonable default behavior. It just works, and it lets developers be productive with SSIS without needing to worry about manually configuring security settings.

The criticism that I have is that people just don’t seem to understand this, and that causes problems. If one developer didn’t understand it, then it’s probably his fault. But if dozens or hundreds of people have this same problem (and they do – just take a look at the SSIS forums on MSDN to see how often questions related to the ProtectionLevel property come up) then the communication about the feature hasn’t been what it could be.

So what is the problem, and how does it show up? Well, consider one of these situations:

The package developer checks his package into source control[6] and another developer checks it out to make changes. The second developer can open the package, but gets an error saying “Failed to decrypt protected XML node "DTS:Password"” and the package will not execute.
The package developer deploys the package to the production server and the DBA schedules it to execute via a SQL Server Agent job. When the job is run, the job step fails with an error saying “Failed to decrypt protected XML node "DTS:Password"” and the package will not execute.

I won’t go on. You probably see the trend already. With the default settings, initial development is easy, but downstream tasks are more difficult. Only the user who saves the package can edit and execute it without error. This is probably not what we want.

What about the other options? Let’s consider:

EncryptAllWithUserKey – This option has all of the drawbacks of the default EncryptSensitiveuserKey option, but has the additional drawback of encrypting the whole darned package, so not only can no other user execute the package without re-entering the sensitive data, no other user can even open the package. Probably not what you want.
EncryptSensitiveWithPassword – This option replies on the package developer entering a password for the package, and then entering that password every time the package is opened or executed. Um… right. I don’t know about you, but my real-world SSIS projects tend to involve many dozens of packages. And I typically have a dozen or more open in Visual Studio at any given time. So picture me staggering to my desk on a blurry Monday morning, desperately clutching my mug of coffee, and launching Visual Studio. Visual Studio loves me, so it has remembered the packages I had open when I shut down for the weekend, and in a display of digital affection it re-opens them all.[7] Now picture the look on my face as I am prompted to enter the &^%$%@(* package $#^%#& password for every &@%$(^@ package as it opens. (And for that matter, picture the look on the faces of the poor people from accounting down the hall as they run for the fire escape.) And to make matters worse, this option pretty much negates the security it’s designed to provide, because every developer on the team needs to know the package password, and at least one developer[8] will put that password on his white board or on a Post-It Note.
EncryptAllWithPassword – Just when you thought it was safe to go back into the Properties window, here comes an option which combines the worst of all worlds. Not only do you have to enter the &%#*& password every time you open the package, the whole package contents are also encrypted so if you forget the password[9] you’re pretty much SOL[10].

So what does that leave us? That leaves us with two options. The first is the one that I use and advocate: the DontSaveSensitive option.

This option, like the others we’ve seen so far, is pretty much self-describing. It protects the sensitive data by simply not saving it in the package file. What’s not to like? No password, no user keys, everyone can open the file without error. It’s a little slice of perfection in our imperfect world!

Maybe.

Maybe not.

The difficulty, of course, comes in when you actually want to execute the package. The package can’t run if it doesn’t have a valid connection string to connect to that legacy database server from which it extracts data, now can it? So where does the sensitive data get stored, if not in the package? The answer is that we, the package developer, need to store that sensitive data in package configurations. This could be XML config files, it could be a SQL Server database, or the Windows Registry – there are many options. And each one of them can be secured in its own way, through database permissions, NTFS permissions or Registry ACLs.

The problem that some developers have with this is that it doesn’t let them get down to doing what they want to do the most, which is build their packages. And what do we say to those developers?

We say “tough luck!”

We don’t have much sympathy for those developers, because they’re missing the big picture. Yes, it can slow things down up front to have to add configurations to your packages, but this saves a ton of time later on. Configurations are a required feature of packages that will need to get deployed to multiple environments (and unless your packages will only ever be run by you on your machine, then this means your packages) and by adding them right at the beginning of package development you will have a much greater chance of having a successful deployment than if you wait until you start deploying to start thinking about deployment.

So this is what I recommend that SSIS developers do when building their packages:

Set the ProtectionLevel property to DontSaveSensitive
Add a package configuration for each connection string or other sensitive property in the package.
Test everything and make sure that it works.
Save the package and use it as a template package moving forward.[11]

Ahhh… life is good. Life is so much simpler when you have prescriptive guidance.

But wait… What about that last option. What about ServerStorage? Should we ever use that one?

ServerStorage can only be used when deploying packages into a SQL Server database[12]. If you deploy to SQL, then you can rely on the database to protect your sensitive data. But as you probably know, this doesn’t apply to package development. Visual Studio can only work with DTSX files on the file system – it cannot open packages from or save packages to SQL Server. This means that while ServerStorage is a viable option for once package development is complete, but not during development. During development you want to use DontSaveSensitive, even if you are planning to deploy to SQL.

So there you have it. I’ve spent five pages[13] saying “use the DontSaveSensitive package ProtectionLevel option when you develop SSIS packages.” But even though it took longer to read, hopefully you’ll have a better idea of why you should choose this option than if I’d just come out and told you…

[1] Just as a side note, re-read the subject line. Doesn’t it sound like the name of a children’s story? It was all that I could do to not write this post as an AeSISSop’s Fable or modern cautionary tale[2].

[2] I also started putting it to the tune of the Gilligan’s Island theme before i caught myself. I haven’t been feeling well lately, you see…

[3] Say it in your best Shrek voice and it may make sense. Especially in the fable context.

[4] Read the rest of the sentence and then have a good laugh with me, ok?

[5] Wait, then wipe the tears from your eyes, and then start reading again.

[6] And we all use source control, right?

[7] Awww….

[8] You know who I’m talking about.

[9] And can’t find your way down the hall to Bob’s white board.

[10] That’s “Server Offline” to you.

[11] There will likely be other things that need to be set up in your template package too, but this is a good start.

[12] I personally prefer file system deployments and so do most of the SSIS professionals I know. Check out this post from Jamie Thompson where he discusses this topic, and make sure you also click on the link in the top of his post too.

[13] And if memory serves me, set a new record for the number of footnotes in a single blog post.

Monday, March 30, 2009

Hey, That Was Fun!

As you probably know, I’m a Microsoft Certified Trainer (MCT). As you may not know, MCTs are required to renew their certification each year, and the 2009 renewal period is currently underway. I got my MT 2009 welcome kit a week or so ago.

And with that welcome kit came a nice surprise – a voucher for one free Microsoft Certified Application Specialist (MCAS) exam.

Now, if you’re like me, you probably like to think of yourself as being a geek. Someone who builds software that has never existed before, and who dominates the poor software that is unlucky enough to cross your path. (Or something like that anyway.) So although I had been aware of the MCAS certification for some time, I had never really considered trying to achieve it. But hey – free is free, right?

So last Friday I scheduled my exam (77-601: Using Microsoft Office Word 2007) for this morning. I was going to be out of the office for a dentist appointment anyway, and the test center is on the way home, so why not, right?

Well, to make a long story short, I passed the exam and had a great time doing it. The exam was delivered in Word 2007 – each question gave me a list of things to accomplish and I then used Word 2007 to achieve the required goals. It didn’t matter what steps you used (I’m a big keyboard shortcut user and don’t use the Ribbon too often) so long as you achieve the requirements as written.

The thing that struck me here was that this exam really tested my ability to use Word 2007 as a “power user.” Not just someone who typed documents, but someone who knows what features are there and who knows how to use them. I doubt that I’ll stop taking my technical certification exams any time soon, but I think that I may just need to try my hand at the Excel 2007 and PowerPoint 2007 exams too…

Saturday, March 28, 2009

SSIS Development and Source Code Control

In my personal best practices for SSIS development, I claim that we need to treat SSIS packages like code. When I give presentations on this topic, I have a slide that elaborates and talks about how we must use source code control for our SSIS projects just like we do[1] for our “traditional” VB and C# projects. It’s one of those irrefutable best practices of software development; you wouldn’t drive your car without your seat belt, and you’d never even think of building software without source code control.

In fact, this is so obvious and so important, I had assumed that I must have blogged about it years ago. So I came here tonight to get the URL so I could reference it somewhere else.

And I couldn’t find it. So here’s the post I was expecting to find,[2] so that the next time I look I will find it here.

The points that I like to bring up in my presentations are how even though SSIS development is software development, there are inherent differences in the tools that can turn into “gotchas” if you’re not careful. In particular, walk through these basic steps just to make sure you’re following me:

Open an SSIS package in Visual Studio – a package that has at least one user variable defined.
Look up at the tab at the top of the package designer. You’ll see that the tab displays the name of the DTSX file followed by the word [Design] in square brackets, in case you weren’t sure you were looking at the designer and not the raw XML.
Ensure that there is no “*” asterisk after the word [Design] – this asterisk shows that the file has been modified and needs to be changed, and for the purposes of this exercise we need it to not show up. If your package does show an asterisk, save it now to make the asterisk go away.
In the Variables window, click on the name of a variable. Don’t change anything, just click.
Then click on the design surface. Again, don’t change anything – just click.
Then look up at the tab. What do you see?
That’s right! An asterisk!! The package has been changed by a non-editing action on your part. Has the world gone mad?!?

Um… probably not. Or at least, not yet.

But let’s assume for the sake of argument that you followed this best practice and have been using source code control. And let’s further assume that since no one told you any different that you set up your source control plug-in in Visual Studio and left it with the default settings.

Uh-oh.

Let’s take a look at those default settings, shall we?

That’s right. The default settings are evil.

Why evil? Well, think back to the “not really editing a variable name” scenario from earlier. If this package were under source control, you just checked out that package without meaning to, and probably without realizing it. And that means that as soon as you walk out the door to go on that two-week vacation[3], you just became that guy. That’s right – the guy who gets blamed for things because he’s not there. Because the default source control settings in Visual Studio are to silently check out a file when the developer starts editing it. And while this may make sense for traditional code files, it definitely does not make sense for DTSX files.

And the worst part is, you really are to blame here, at least in part, because you did not resist, you did not fight back against the evil that consumed you and is now ravaging your team’s productivity. You’re kind of like Gollum.[4]

But do not despair – it is not too late to turn back from the path of evil! In fact, you really just need to change one drop-down, like so:

That’s right – just say “Prompt for exclusive checkouts” and the problem goes away[5]. Any time that you perform a step in the SSIS designer that would cause the package to get checked out (we walked through one with the variable name thing, but there are many more) you’ll be prompted to check out the file.

And that’s about it. As usual, I spent five pages to say what could have been done with one screen shot and ten words, but hopefully you’ll understand the context a little better this way.

So always use source code control, but also be certain that you have configured it correctly for SSIS.

[1] We do, don’t we?

[2] Yes, it’s late on a Saturday night, but hopefully you too will see that this is more than a little like time travel. Or is it just me?

[3] I did mention that you are going on vacation, right? Go on – you’ve earned it!

[4] Did I mention how much I loved you in those movies?

[5] Yes, this means that Sam and Frodo’s budding romance will probably never get a chance to bloom, but that is a price I’m willing to pay for you to enjoy your vacation.

Thursday, March 26, 2009

Going to Tech·Ed 2009 North America!

I’ve been to Tech·Ed North America every year for the last four years. It’s Microsoft’s biggest technical conference and is an amazing opportunity to learn and to network – and it’s always a ton of fun as well. People have been asking me a lot lately “are you going to be in Los Angeles?” and I haven’t been able to answer them.

Until today.

I learned this week that I will indeed be going to Tech·Ed. I’ll be In Los Angeles the whole week, from May 11 through May 16, helping to represent Microsoft Learning at the conference. I’ll have more details to share about what I’ll be doing as the conference approaches, but for now I just think that I’ll be staying away from places where I see this sign:

Hang here? Yikes! ;-)

Seriously, interacting with Microsoft Certified Trainers is probably going to be a big part of my role at the conference. I just wonder how “hang” ever got through the layers of review.

So I’ll be there – will you?

Tuesday, March 24, 2009

Get on the Bus!

if (you.Location.Match(“Atlanta”,
    “Charlotte”, “Philadelphia”, “Columbus”,
    “Indianapolis”, “St. Louis”, “Kansas City”,
    “Denver”, “Albuquerque”, “Phoenix”,
    “Las Vegas”, “Los Angeles”) ==
        MatchResults.CloseEnough)
{
    you.EnterContest(“Get on the Bus”);
}

if (you.WinContest(“Get on the Bus”))
{
you.AttendConference(Microsoft.Conference.TechEd,
Cost.Free);
}

Sound like a good deal? Learn more and enter here: http://www.microsoft.com/learning/mcp/offers/getonthebus/default.mspx

(I just wish that I were eligible…)

Friday, March 13, 2009

Indexes and SSIS Data Flow Performance

Wow – it’s been a while, hasn’t it?

As you may know, I took a job with Microsoft back in October. It’s been a fun (and insanely busy) five months; the work is challenging and diverse, the team I work with is talented and motivated and there is always something new and interesting to learn and/or do. What’s not to like?

In fact, even though BI isn’t a core component of my job role, I have been working with SSIS and the rest of the SQL Server 2008 BI tools to get easier access to some of the data I need to make better decisions and to support the recommendations I make with some “real numbers.”[1] It’s an easy way to add some value to existing data, and for me it’s a fun way to keep my BI skills from getting too rusty.

So… I re-learned an interesting lesson the other day: Don’t load data into indexed tables.

One of the SSIS packages I’ve built extracts data from a set of text files and loads it into a set of SQL Server tables.[2] There is around 20 GB worth of text file data that ends up in one particular table. When I ran this package Monday evening, I found it was still running Tuesday morning, which was not expected. I looked around in SSMS and noticed that there was one non-clustered index on this table. I had added it recently and had not dropped or disabled it before running the package. So when the package finished running, I took a look at the execution results and found this:

DTExec: The package execution returned DTSER_SUCCESS (0).
Started: 6:03:50 PM
Finished: 9:54:20 AM
Elapsed: 57028.7 seconds

That’s right – the package took nearly 16 hours to run. Ugh!

So I re-ran the package this morning, after having ensured that all indexes on this table had been dropped. And the result?

DTExec: The package execution returned DTSER_SUCCESS (0).
Started: 9:49:21 AM
Finished: 10:41:15 AM
Elapsed: 3114.05 seconds

That’s right – the package completed in well under an hour, taking roughly 5.5% of the time it took to execute the package against the indexed table.

But wait, you say – that’s not a fair comparison. You still need to rebuild that index!

You’re right, of course – the time to rebuild the index needs to be considered before we can make any apples-to-apples comparison. So what I did was to re-create all of the indexes on this table (four indexes, each with included columns, for a total index size of 15.8 GB) which took an hour and 39 minutes. Even though this is longer than the package execution itself, the total end-to-end execution time is still only around 15.9% of the package execution time with one index on the target table. In the end we achieved an 84% reduction in execution time.

So what’s the lesson?

Drop or disable your indexes before loading data into your tables, and then re-create or enable them once you’re done. If you don’t, then SQL Server will keep incrementally rebuilding the indexes while your package executes, thus killing your performance.

This isn’t new advice – this has been a best practice for bulk loading data into SQL Server for as long as I can remember. But it is something that I have not blogged on before, and since failing to follow this advice bit me this week I thought I would take the time to share my pain with the world in hopes that someone else can avoid it.

And, of course, it’s good to blog on SSIS again. Hopefully I’ll be posting again before another few months have passed…

[1] Please note that I’m not calling this a “BI solution” here. At this point I am more in a discovery phase than anything else. The components I’ve developed, while functional, will probably get thrown away once they have served their purpose as prototypes and learning tools. All of the performance numbers come from running the packages on a laptop, with both the target data database’s data file and the source text files on the same external 5,400 RPM USB hard drive. No, I wouldn’t do this in production either. ;-)

[2] The astute reader will note that this is in flagrant violation of the “each package should only load data into one table” best practice that I talk about whenever I have someone who will listen. The scenario in which I’m working here provides an interesting exception to this guideline, and please trust that I did think hard before going against my own advice.

Friday, February 6, 2009

I’m Certifiable – Are You?

Odd are, if you read this blog you are certifiable. One way or the other.

But of course, I’m talking about Microsoft technical certifications in the developer and IT Professional world. (What were you thinking?) I’ve been a Microsoft Certified Trainer since 1996, and have taken any number of certification exams since then. I love certification for many reasons[1] - and one of the reasons is that I thing that the certification exams are fun.[2] I love the ability to demonstrate the depth and breadth of my knowledge in an almost competitive setting – it’s almost like playing chess with myself.

But not everyone has seen the fun side of certification. Until now!

Microsoft Learning[3] has just launched a new online game that is like a cross between a certification practice exam and an Xbox 360 game show. It’s a riot – I played a quick game over lunch today and loved it.

But instead of reading my opinions, you should check it out here. Enjoy!

[1] But I’m not going to try to go into all of the reasons why. If you want more on that front, check out the Born to Learn blog where Ken Rosen and his team post just about every day.

[2] Please refer to the title of this post again.

[3] Yes, this is the group within Microsoft where I work these days, even though I can’t take any credit for the genius behind this site.

Friday, January 16, 2009

Loving Windows 7

I installed the public Windows 7 beta on one of my laptops early this week. This is a laptop that I use all the time when I’m at home (I love the keyboard, so I sit at this one and Remote Desktop into my other machines) but it’s been running Windows XP because Windows Vista never ran particularly well on its 2004-era hardware. And as is always the case when working with a pre-release operating system, there is both good news and bad news.

Let’s do the bad news first:

I’m traveling this week and next, so I only had two days to play with new features before heading out on the road.
I have a bunch of demos I need to do next week at the MCT Summit in Redmond, so I can’t blow away the Windows Server 2008 install on my demo laptop and replace it with Windows 7 until the Summit is over.
That’s about it for the bad news…

The good news is much longer, so I’ll just hit on the highlights:

Windows 7 runs faster and better than Windows XP ever did, even with the same software running. I’m vastly impressed by how well it runs, and I can’t wait to get it installed on more modern hardware.
It looks and feels great, with all of the nice features of Windows Vista (I love Vista search and some of the Explorer enhancements that I miss so much when I go back to Windows XP or 2003) and is more usable (and just plain prettier) as well.
Windows 7 automatically detected my Nvidia video card and downloaded drivers for it. This may sound trivial (and isn’t really something for which i can give credit to the Windows 7 team) but the new driver finally works. For all previous operating systems and driver versions, this video card has always displayed any video being played in full-screen mode on the secondary monitor. This meant that I could not work on one display while having a video (any video – not just Manowar videos) play on the other. With the new driver, this is no longer the case, and video finally works the way that it should.
Did I mention it’s amazingly fast? ;-)

But the thing that prompted me to post this article is not my own experience. I found today a list of “Windows 7 Secrets” published by Tim Sneath here: http://blogs.msdn.com/tims/archive/2009/01/12/the-bumper-list-of-windows-7-secrets.aspx. This includes a ton of keyboard shortcuts (one of my favorite tools) and ways to work with the new OS features that you’d be unlikely to stumble across on your own. Definitely check it out the next time you sit down at your Windows 7 machine, or if you’re wondering whether you should make the move…