Tuesday, December 25, 2007

Getting Back What You Put In

Today was Christmas, and one of the gifts I received is a delightful book titled Making Artisan Chocolates by Andrew Garrison Shotts. I've spent the day with family and friends, so I haven't had much time to read it, but the author's dedication at the front of the book contains a quote I just could not wait to share:
"From my father, Don, I learned work ethic and discipline. He taught me that a worthwhile outcome is fully dependent on the time and effort it takes to get there."

It's obvious that Mr. Shotts' father was a wise and thoughtful man. It doesn't matter if you're a chocolatier, a data architect or a mechanic, the effort that you are willing to put into your work determines the quality that you get back out. For me this usually means spending those late nights researching how SSIS works under the hood, or how to eek just a little better performance out of a stored procedure (although now that I have this book I may well be spending some time dusting off my chocolate tempering skills) but no matter who you are and what you do, these are words truly to live by.

And now, those raspberry-wasabi chocolates look irresistible...

Friday, December 21, 2007

You Can't Go Home Again

Warning: This is a completely non-technical post. If you want to know anything about SQL Server or SSIS, please skip this one. If you want a little insight into my brain (please don't ask why you would want this - I have no idea) then feel free to read.

On my drive to my users group presentation in Rochester, NY two nights ago, I drove through my old home town. It was on the way and I wanted to stop by an old workplace to share some Christmas cookies I'd baked with some ex-coworkers.

No, that doesn't really cover it. I need to back up a little to do this justice.

Back before I started my career in IT, I worked for a regional retail drug store chain called Fay's Drugs. This was my first job; I started working there when I was 16 and stayed employed there for far too long. Although many people who know me today find this hard to believe, I used to be a seriously under-motivated person. I always pushed myself to do everything that I did as well as I could (these days arguing with the boss tends to get me a raise or a bonus; in those days it got me written up and yelled at, but that didn't stop me) but I wasn't particularly motivated to go out looking for bigger and better challenges like I seem to do today. I was comfortable where I was, and in those days I wasn't compelled to constantly push myself out of my comfort zone.

And Fay's was like a second home to me. Maybe even a first home, because for many years I spent more waking hours there than I spent at my real home. I made friends (and more than friends) and learned all sorts of valuable lessons. I've been known to say that the things that have made me successful as an IT consultant are the things I learned at my "dead-end retail job" and not what I learned in my computer science classes in college. And although I moved on many years ago, I have still kept in touch with many of the friends I left behind. And even though Fay's was bought out shortly after I moved on (it hasn't been called "Fay's" by anyone but me in well over 10 years) it was still home to some part of me.

And in the years that followed, I slowly developed this idealized vision of the "Fay's that never really was." Things truly were much simpler then (of course, when you make 1/10th of the money, you generally get a lot less job stress too) but my mind turned it into much more than it ever really was. When deadlines have loomed in my consulting work, I can tell when I am pushing myself too far, too hard, when I am too stressed for my own health, because I have dreams of working for Fay's again. It's sort of like a subconscious canary in my mental coal mine.

In short, silly though it may sound, I have had a lot of emotional investment in this long-lost part of my past.

So imagine my shock when I discovered that this part of my past was gone.

When I pulled up to deliver those cookies, I found that the store where I had been hired for my first job, where I worked for many years, where I learned so many valuable life lessons and made so many friends, had closed its doors forever. It was like a physical blow.

I couldn't dwell on it then - I had a 75-mile drive ahead of me still, and a presentation to deliver, and I could hardly do that if I were all mopey and distracted.

But this has been weighing heavily on me ever since. I've been down and depressed - not a lot, but enough that I felt compelled to write about it, that perhaps I could achieve some sort of catharsis by putting my pained thoughts into words. Who knows if it will work. I've heard the phrase "you can't go home again" many times in my life, but never until now has it really hit home quite like this...

Design Conflict

I had a dinner meeting a few weeks back with some members of my development team at work (including the new VP!) and one of the topics of the discussion was that of team dynamics. I've found over the years that the most successful teams are teams of "invested equals." Invested because people only give their best when they think that they can make a difference and when they know that they'll be rewarded, and equals because people (for some strange reason) tend to do what their bosses tell them to, as opposed to pushing back and demanding that their voice and their ideas be heard.

As part of this wonderful dinner conversation, I told a story of how Ted Malone (who still claims to not regret recruiting me) and I were both attempting to come up with a design for a vital new application component. He presented his idea. It was so wrong. I presented my delightfully well thought out idea. He failed to see its beauty. (Please keep in mind that Ted may have somewhat different recollections of this morning.) So we dueled, whiteboard markers at dawn, as it were, and by the time the dust had settled the resulting design was more complete, more elegant and more satisfying than anything either one of us could have come up with alone.

Which brings me to a quote:
"A design that comes out of an argument is always better than a design that comes out of a committee."

Believe it or not, this is actually my own quote. (I'm not usually nearly this pithy, and am forced to quote the brilliant people around me.) It's been sitting in the back of my brain for the last few weeks, waiting for a chance to come out again.

And that chance may well be on the horizon. I've been involved in an email discussion with our CTO (who may well have the biggest brain on the planet - this guy is scary sometimes) on how we may be able to apply Microsoft BI technologies to solve some very interesting (and by "interesting" I mean oppressively difficult) problems in the configuration analytics space. The opportunity I see ahead lies in the fact that I’m the type of person who needs a real, concrete problem to look at and to wrap my brain around. Then I can step back, generalize and come up with an abstract “problem domain” that represents the whole problem to be solved, of which my concrete example was only one instance. But I need that one concrete instance in order to begin.

Dennis, on the other hand, is the type of person who always thinks in abstracts. (Or in any event this is the impression I've gotten; I honestly haven't asked him. Yet.) He always has that overarching “big picture” in mind, and even though he can drill down into the little details at will, that’s not how he looks at the problems natively.

So Dennis and I have some whiteboard time scheduled for next month. My brain is almost literally salivating (yeah, picture that one) at the thought of the mental duel that lies ahead. Bring on the conflict!

Thursday, December 20, 2007

SSIS and SQL Server Agent

SQL Server Integration Services is great. But SSIS deployment can sometimes be painful. One of the things that makes it painful is that SSIS is built on a relatively complex set of technologies (like SQL Server, Windows Server and Active Directory[1], for starters) and when you go to deploy your ETL solution, you need to have a pretty good understanding of how those technologies work - and how they work together - or else you could well be in for a rough ride.

Nowhere is this more evident than when it comes to deploying SSIS packages to run via a scheduled SQL Server Agent job. I wish I had a dollar for every time someone posted this message on the SSIS forums on MSDN:
"I built my package and it works great, but when I deploy it to SQL Server Agent it fails. Is this a bug in SSIS?"

Yes, I blogged on this back in May. But I've seen this same problem in one form or another at least a dozen times this week on the forums, so I wanted to post on the same topic again. It's important. And although no one part of it is really complex or difficult, there are enough parts that come together during deployment that a lot of people are really struggling with it, despite the volume of documentation and other information that's available online. Hopefully this post will help, at least a little.

Here's the deal [2]:
  • When you run a program in Windows, that running process has your permissions. This is why Microsoft Word can save files to your user drive, when only you have permissions to access that folder. Each running process has the access permissions of the user account that launched it.
  • When you run a program from a SQL Server Agent job, that running process has the permissions of the execution context configured for the job step. By default this is the SQL Server Agent service account, but it can also be a Proxy account, depending on how the job step was configured.
  • By default, the SQL Server Agent service runs in the security context of the Local System account. This means that it has lots of privileges on the local computer, but none whatsoever on the network.

See the problem? Nine SSIS packages out of 10 need to access remote resources. These could be SQL Server databases (remember how we use Windows Authentication whenever we can?) or text files on a file server, or an Access database (MDB file) on a file server, or any number of other things. SSIS packages generally exist to move data from one place to another place, and generally those places aren't all on the same machine, so the packages need to execute in the security context of a user account that can access all of those resources on all of the machines involved.

Please let me repeat/rephrase that, because it's the really important thing in this post.

"SSIS packages need to execute in the security context of a user account that can access all resources on all of the machines involved in the package processing."
What does this mean to you?

It means that when you run the package yourself, the package has your permissions. It can do what you can do and access what you can access, for the simple reason that you ran it. And if you didn't have permission to access all of those resources when the project started, you discovered this problem early on, and got the access you needed because otherwise you couldn't have built the package at all in the first place.

So of course it runs for you!

But what about that SQL Server Agent job?

It should be pretty easy to figure out from here, right?
  1. You need to create (or work with the people in charge of your network to get created for you) a Windows user account[3] that has all of the permissions necessary to access the resources your package uses, databases, file system folders, shared folders and so on.
  2. Then you create a SQL Server Agent Proxy based on that Windows user account.
  3. And then you tell SQL Server Agent to run your job step in the context of this proxy.
One, two, three, Go!

That's really how simple it is. And the craziest part of all? Microsoft has already documented the heck out of this online. Here are a few key links that will go into much more depth than I have here, so if you're still having problems, why not click on them and read what Microsoft has to say.
  • From SQL Server Books Online, "How to Run a Package Using a SQL Server Agent Job": http://technet.microsoft.com/en-us/library/ms139805.aspx. (Please note that this page has links to sub-articles and related articles. Click on them and read those pages too.)
  • From the Microsoft Support Knowledge Base, "An SSIS package does not run when you call the SSIS package from a SQL Server Agent job step": http://support.microsoft.com/kb/918760. This article goes into many scenarios that could cause an SSIS package to fail when executed in a SQL Server Agent job, not just the security related stuff in this blog post, but from my experience, it's the security that is tripping up most people.
Are you ready to go? I sure hope so. Hopefully the next time someone runs into this problem I can simply point them to this URL instead of having to type up this information each time. And maybe, just maybe, someone will search and find this article before they cry for help on the forums.[4] And if that happens, it will all be worth it, and these blisters on my fingers won't seem to hurt at all anymore... ;-)

[1] No, SSIS isn't really "built on" AD, but read on and you'll see what I mean.
[2] Yes, I know I'm simplifying things quite a bit here, but that's intentional.
[3] If you have Active Directory, this needs to be a domain account. If not, then you have a bunch of reading to do on Windows security, because I'm not going there today.
[4] If you are that person, please post a comment to let me know.

Merry Christmas (of Steel!)

As regular readers of this blog know, I'm a huge fan of the heavy metal band Manowar.[1] Their music is incredibly powerful and is a constant source of energy and inspiration for me in many ways in my personal and professional life. I try not to let too much of this personal passion bleed over into this largely-technical blog, but in the spirit of the Christmas holidays, this time I could not resist.

Christmas holidays? Manowar? What am I talking about?

That's right - Manowar has released their first ever Christmas carol! They just made available both English and German versions of Silent Night for free download as part of a promotion with Amazon.de[2]. Please take a moment from your busy day and download one or both versions and see how Eric Adams' amazing voice delivers this classic Christmas song, and understand at least a little of how Manowar's music inspires me every day.

Merry Christmas!!

[1] You can find their official website at www.manowar.com, but it's not always work safe, and violates most UI and design guidelines known to man, so browse at your own risk.
[2] If you click on the link you'll see that the promotion is for the latest Manowar DVD, filmed at the Magic Circle Festival in Bad Arolsen, Germany this July. You should also definitely pick up the DVD as well, both to enjoy the music and to see if you can spot your favorite SSIS blogger on stage and in the front row. ;-)

Last Night's Deployment Slides Online

I had a great time last night in Rochester at the Visual Developers of Upstate New York users group meeting. I delivered a session on SSIS Deployment, based on requests from the organizer of the users group, who attended my ACM presentation back in October. Deployment is not one of SSIS's strongest points, so I knew we'd have a lot of fun. We had a great turnout and some killer questions, and everyone had a great time.

I've put the slide deck from last night's presentation online - you can get it here.

There's not a lot of "meat" in the slides (this is probably why I was actually able to finish on time, which is pretty much unheard of when I only have 90 minutes to work with) but quite a few people asked for them, so here they are. If you have any questions, comments or criticisms, please be sure to post them here.

Tuesday, December 18, 2007

Reminder: SSIS Deployment Presentation Tomorrow

In case you've forgotten to add this to your calendar, I am delivering a presentation on SSIS deployment at the Visual Developers of Upstate New York users group meeting tomorrow night. The meeting is scheduled to begin at 6:00 at the Microsoft offices in Rochester, NY. I've heard from the users group organizers that they have received many more RSVPs than they usually get for presentations this time of the year (it's hard to compete with Santa) so you may want to plan on showing up a little bit early too.

Don't forget - After the presentation there will be plenty of time for Q&A, so be certain to bring your tough SSIS questions!

Yay! PDC 2008 Dates Announced!

I'm a few days late in hearing this news, but my friend Anton let me know this morning. Consider it an early Christmas present (why do I have Elvis's "Blue Christmas going through my head now?) from Microsoft to you:

Save the Date!
Announcing PDC08
October 27–30, 2008
Pre-conference October 26, 2008
Los Angeles, California
OK, OK. We are delighted to announce the date and location of the next Microsoft Professional Developers Conference (PDC): October 27–30, 2008 at the Los Angeles Convention Center. PDC is the definitive Microsoft event for software developers and architects focused on the future of the Microsoft platform. Mark your calendars and save the date. More information coming soon.

http://msdn2.microsoft.com/en-us/events/bb288534.aspx


I'll be there - will you?

Sunday, December 9, 2007

Books, books, books!

My "A Whirlwind Tour of SQL Server Integration Services" presentation last week at the CNY Developers User Group was a huge success. We had a big turnout and everyone seemed very enthusiastic about the capabilities of the SSIS platform, and many attendees stayed around very late to ask questions and get into wonderful tangents about SSIS.



And one of the most frequently asked questions was "what SSIS book should I buy?" Now in an ideal world this should be an easy question to answer (and in an ideal world I would have had a slide handy to answer it) but sadly, there is no single SSIS book that I can wholeheartedly recommend to developers who are new to the SSIS platform and tools. So instead of having a nice simple answer, I'm forced to resort[1] to answering with a list of links and explanations. And here is that list:


  • Microsoft SQL Server 2005 Integration Services by Kirk Haselden: If you're new to SSIS, this is the one book to buy. It's not a great book (sorry Kirk, but I'm awfully picky) but the author has great insight into the product (as well he should, since he led the development team for SSIS up through RTM) and does a good job of communicating its ins and outs to the reader. If this book went into more depth and had more real-world examples (yes, and had twice as many pages) it would be a must-have.
  • Professional SQL Server 2005 Integration Services by Brian Knight, Allan Mitchell, Darren Green, Douglas Hinson and a bunch of other people: This book fills in a few of the cracks that Kirk Haselden's book does not, but other than that it doesn't really deliver for me. If you have a book budget and need to buy more than one introductory SSIS book, get this one too, but otherwise you should probably pick up Kirk's book and move on...
  • The Rational Guide to Extending SSIS 2005 with Script by Donald Farmer: This is the best resource I've found on using the Script Task and Script Component in SSIS, and is one of my favorite technical books in general. It's very well written with deep insight into the product, and is a very easy read. However, scripting is one of those "tools of last resort" for most SSIS projects, so this book is also not a "must have" if you're just getting started.
  • The Microsoft Data Warehouse Toolkit by Joy Mundy, Warren Thornthwaite, and Ralph Kimball: This is probably the only "must read" book on my list, but it's not really a SSIS book, or a developer book, so I don't know if it's as appropriate for the people who asked for book recommendations as it is for me. This is a great "BI best practices" book, and while it has quite a bit of SSIS content, but it covers the whole gamut of real-world BI/DW projects from a "you need to do these things" perspective as opposed to a "this is how you do these things" perspective for SSIS, SSAS and SSRS with some other goodies thrown in as well.
  • Foundations of SQL Server 2005 Business Intelligence by Lynn Langit: This book provides a great introduction and overview to the Microsoft BI stack (kind of like the last one, but thinner) and with a more hands-on, how-to focus. (I was the technical reviewer on this book, so I realize that I'm biased and will stop before I say too many glowing things about it. ;-)
  • Expert SQL Server 2005 Integration Services by Brian Knight and Erik Veerman: This is my favorite SSIS book today, hands down. So why did I save it for last? As this title implies, this is not an introductory book - it assumes that the reader has a good foundation with SSIS fundamentals and is ready to start solving real-world problems more efficiently using tested tools and techniques. So this one is a "must have" but it may not be a "must have right now" for everyone.

Wow, it took me a long time to complete this post. Last month I averaged a post every day. Here it is now, halfway through December and I've barely managed to publish two semi-technical posts. (And it took me almost a week and half to type a list of books - how crazy is that?) Why is it that December is always the busiest month of the year for technical projects? It always seems to be that there is 50% more to do in December than any other month, despite the holidays. I've been swamped, and my blogging productivity is suffering, but at least I'm making all of my client-facing deadlines...

Anyway, enjoy! And let me know if you have any questions, too.



[1] Boy, it's great to have someone to blame! I never have nice simple answers anyway, but at least this time I can blame the SSIS book market!

Saturday, December 1, 2007

Zoom, Baby, Zoom!

I know that this is likely to be the least technical SSIS blog entry you read today, but it's still useful, and no one else I've talked to seems to know that you can do it, so I'm going to blog on it anyway.

If you hold down the ctrl key on your keyboard and scroll the wheel on your mouse, you can zoom in and out of your SSIS control flow or data flow designer in Visual Studio. This is really handy for those extra-large packages when you need to see more than your monitors permit, and since the built-in designer tools have inconsistent support for zooming in control flow and data flow, it's both convenient and consistent.

Low tech, but cool and useful.

And to make this even more useful, you can use the same technique in just about any modern Windows application that supports zooming, including Internet Explorer and Word. Enjoy!

Friday, November 30, 2007

The Pain of Living on the Edge

If you're like me, you tend to listen to Manowar really loudly while you write code and built ETL solutions.

No, wait, that's another post. Please let me begin again.

If you're like me, you tend to stay on the cutting edge, upgrading to the latest and greatest development tools as soon as they're available.

Yes, that's the right post.

The great things about this tendency are that you get new features and functionality early, you are constantly learning new things, and when the products are actually released you already have months experience using them, so not only do you have a competitive edge, you can also help others who are less adventurous learn things too. All around it's a real win-win scenario.

But it is not without its pain.

Visual Studio 2008 has recently been released, with important (and cool) new capabilities for people developing Windows and web applications using VB.NET, C# and other "traditional" development tasks. But what about us Business Intelligence developers? Well, do you remember that song from the opening of "Four Weddings and a Funeral?"[1] Well, we've been left out in the cold, like Hugh Grant. The current CTP5 release of SQL Server 2008 still relies on Visual Studio 2005, so if you want to get the goodness of Visual Studio 2008 when building SSIS packages, you're out of luck.

...but like with Hugh Grant, there is a happy ending in sight. Michael Entin, a member of the SSIS product team, posted this information today on the SSIS Forums on MSDN:
1) SSIS 2005 requires VS 2005, and will not work with VS 2008.
2) SSIS 2008 RTM will work with VS 2008 (and only with VS 2008 - there will be no option to use it from VS 2005).
3) It is not final yet when this will be implemented, most likely the next CTP will use VS 2005, and there will be a refresh that provides VS 2008 support.

Michael was careful to point out these these plans are tentative and subject to change, but it's still exciting to know that there is a plan and a roadmap for getting the tools we want hooked up with the BI projects we need and love. Now let's just hope that Visual Studio doesn't marry a rich old Scottish gentleman before then...

[1] In case you missed that one, it was "But Not For Me" by Elton John and went "they're singing songs of love, but not for me..."

Thursday, November 29, 2007

SSIS Performance Tuning Strategy

Similar to my post on SSIS Performance Tuning Techniques, this resource is neither mine nor new, but it looks like a lot of people don't know that it exists. Check out this "SQL Server 2005 Integration Services: A Strategy for Performance" white paper: http://www.microsoft.com/technet/prodtechnol/sql/2005/technologies/ssisperfstrat.mspx#E6G

According to the abstract, "This white paper covers several interesting and unique methods for managing metadata using built-in features in SQL Server 2005 Integration Services, Analysis Services, and Reporting Services. The paper explains such concepts as data lineage, business and technical metadata, auditing, and impact analysis. " This is really weird to me, because there's nothing about SSAS or SSRS in here at all, especially since there's also nothing about data lineage or metadata management. Copy and paste gone wrong, perhaps?

To me the most interesting thing here is the "OVAL" (Operations , Volume, Application, Location) approach for evaluating and tuning SSIS performance. Lots of people on the MSDN SSIS Forums have been asking lately about SSIS performance, and either no one is bothering to look, or else this white paper is difficult to find.

And for people too lazy (or should that be "visually oriented") to read through the whole thing, Donald Farmer (then the SSIS Program Manager) did an excellent presentation at TechEd last year about this OVAL approach, and you can view it on demand here: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?culture=en-US&EventID=1032298087&CountryCode=US

But remember: Performance tuning is a complex art that is a superset of all of the other things that go into building software on any platform, and SSIS is no exception. In order to do it right, you need to understand how the platform works and how your software interacts with it, and have a methodical approach for identifying and refining the scope of performance problems. Although it's wonderful to have an approach like OVAL to use, without an understanding of how SSIS works under the hood, it's not going to do you much good...

Wednesday, November 28, 2007

Calling All Trainers!

If you're a Microsoft Certified Trainer, this is a very exciting time. With new versions of SQL Server, Visual Studio and Windows Server right around the corner (or in the case of Visual Studio 2008, upon us already) you are faced with a massive influx of new tools, technologies and (hopefully) courses to teach. By the time these products ship, your students will expect you to be the expert, and there's never enough time to prepare...

Fortunately, Microsoft understands.[1] And to help out, they're putting on a series of MCT Summit conferences designed to help prepare MCTs with both hard (technical) and soft (presentation) skills they'll need to prepare for the rough waters ahead. There is an MCT Summit in Berlin, Germany on January 14 to January 16, and another in Redmond, Washington on February 4 to February 6[2] so regardless of which side of the ocean you're on, Microsoft has you covered.

But how does this relate to this blog? Funny you should ask!

I will be presenting three different SQL Server-focused sessions at each of the Berlin and Redmond Summit events. Here are the details:

What's New in SQL Server 2008 for Database Developers
This technical session presents information on the key new features for database developers in the SQL Server 2008 product release. Emphasis is placed on the value proposition of the new features, how the features enhance and extend the functionality in SQL Server 2005, and how to communicate the importance of the features to your students. Where possible, the new features are also discussed in the context of the courseware in which they are presented.

What's New in SQL Server 2008 for Business Intelligence Developers
This technical session presents information on the key new features for business intelligence (BI) developers in the SQL Server 2008 product release. Emphasis is placed on the value proposition of the new features, how the features enhance and extend the functionality in SQL Server 2005, and how to communicate the importance of the features to your students. Where possible, the new features are also discussed in the context of the courseware in which they are presented.

So you want to be a SQL Server MCT? Jumping into the Deep End with SQL Server 2008
This session presents information for MCTs who do not currently teach SQL Server courses with the information that they need to be successful delivering SQL Server 2008 courses. The first portion of this session covers a technical introduction to the SQL Server product suite with an overview of core products and features. The second portion covers existing training materials and presents a proposed plan of self-study to help attendees build the skills they need to succeed in the classroom with the SQL Server 2008 courseware.

The Summit events look like they're going to be a lot of fun, and if you're planning on teaching the SQL Server 2008 courseware when it's ready, this is a great way to get up to speed on the new tools, technologies and features heading our way. I'll see you there!

[1] Well, sort of.
[2] There are also summits in Sydney, Australia and Singapore, but they didn't fit into my schedule. :-)

Speaking at SQL Connections 2008

I just received news that my abstract for the session "SQL Server Integration Services Deployment Best Practices" was selected for the spring 2008 SQL Connections conference. As you can imagine, I'm awfully excited! Here's the full abstract:
"So you’ve built a set of SSIS packages to populate your data warehouse, and everything works great on your development environment – but what next? SSIS has a great set of tools, but when it comes to preparing for and executing a seamless deployment from development into test and finally into production, it’s not always obvious what tools to use or how to best use them. In this session you’ll learn how to prepare for and execute a seamless deployment of complex ETL solutions through a repeatable, testable process. If you’re tired of saying “but it worked on my machine!” then this is one session you can’t afford to miss."

I submitted a handful of different abstracts, but this is the one that got selected, and there's probably no surprise there. Many of the questions posted on the SSIS MSDN Forums are related to deployment, and when people complain about SSIS (like that would ever happen, right?) deployment almost always comes up. Hopefully before too long the SSIS team will have a set of powerful deployment tools that take away the pain of deployment, so people would look at an abstract like this and say "why would anyone ever attend a session on this?" But from what I see today it's not happening in SQL Server 2008, so you'll probably want to attend this one.

I'll see you in Orlando!

Tuesday, November 27, 2007

Let Your Voice Be Heard!

The SQL Server Integration Services Team is looking for your feedback on how to prioritize their work on future versions of SSIS. How cool is that? It's not quite as cool as being invited to an SDR (yeah, this is kind of like saying Hammerfall isn't quite as cool as Manowar - they're somewhat similar, but orders of magnitude different in how cool they are) but you still get to share your opinions with the people who build the tools you love.[1] Or not. Either way, it's cool.

And to make it even cooler, they're giving away five 80GB Zune MP3 players to people who complete their survey by December 20th.

Here's the survey: https://mscuillume.smdisp.net/Collector/Survey.ashx?Name=SqlETLSurvey2

How important is this? Consider the fact that I'm posting a link to it on my blog, even though each person who enters is reducing my chances of winning that Zune. So let your voice be heard!

[1] For all SSIS team members reading this post, yes, I would love to come to your next SDR. You don't even need to give me a Zune. Although I'd take one. ;-)

Tuesday, November 20, 2007

SQL Down Under Show Online

Remember back on Halloween when I recorded an episode of SQL Down Under with SQL Server MVP Greg Low? Well, the episode is now online! You can download it in MP3 or WMA format here: http://www.sqldownunder.com/

Enjoy!

Monday, November 19, 2007

Things are Looking Up in CTP5

One of the things I've really been looking forward to with SSIS in SQL Server 2008 CTP5 is the enhanced Lookup transform. And related to that was looking forward to blogging about it.

But of course, I'm a day late and a dollar short. Or as the case may be, three days late and a few bucks ahead. Because Jamie Thompson has already blogged about it, and I don't see anything for me to add:

http://blogs.conchango.com/jamiethomson/archive/2007/11/16/Katmai_3A00_-SSIS_3A00_-Lookup-component-gets-a-makeover.aspx

So now I don't need to blog it, I can just link to Jamie's post, and get back to work. Thanks, Jamie!

Katmai CTP5 - Public Download Available!

Microsoft has just posted the SQL Server 2008 CTP5 downloads to the public Download Center:

http://www.microsoft.com/downloads/details.aspx?FamilyId=3BF4C5CA-B905-4EBC-8901-1D4C1D1DA884&displaylang=en

Please note that there are both installers and ISOs for x86, x64 and IA64 as well as an Express Edition download. Very cool stuff - Go get it!

Line Numbers, Where Art Thou?

So as mentioned in my last post, I'm delighted to be able to use C# in the Script Task and Script Component in SSIS for Katmai CTP5. I was a bit less that delighted, however, to not have any line numbers in the the VSTA editor. And to make matters worse, there were no options to turn on line numbers in the Tools Options dialog box!

Yes, I become alarmed very easily.

Then I noticed the "Show all settings" check box, and checked it. This, of course revealed the option I was looking for:



And this, in turn, allowed me to make my code window look like I need it to look:







I know this isn't high-tech or difficult, but it took me a minute or five to figure out what was going on, so I figured I might not be alone in my distress...

Katmai Scripting: A Step in the Right Direction

In a recent thread on the SSIS forums on MSDN, a poster was bemoaning the number of steps required to open the VSA script editor for the Script Task in SSIS. And I was sharing his pain.

Fortunately, there is a little improvement on the horizon, but just a little at this point. In CTP5 of SQL Server 2008 the "Edit Script" button (note that it has been renamed from "Design Script" but that the keyboard shortcut key remains "S") is now on the first tab of the editor dialog box, so one fewer steps is required to open VSTA.

































Also note that C# is the default language in the Editor dialog box. Oh, how I look forward to being able to use C# for my SSIS development needs...

But we still need that "Edit Script" context menu item, don't we?

BIDS Helper 1.2 Released

BIDS Helper is a CodePlex project that delivers a set of Visual Studio add-ins for developers working on BI projects. It's been around for a while, but until now has not had any SSIS-focused features.

Until now!

The BIDS Helper team has just released version 1.2 of their tool, and it now includes (in addition to a bunch of SSAS-related stuff as well) these new SSIS features:
  • Create Fixed Width Columns - Simplifies the definition of a fixed width flat file connection manager
  • dtsConfig File Formatter - As far as I can tell, this is just another shortcut for the ctrl-k, ctrl-d "prettify" keyboard chord that is built into Visual Studio
  • Expression and Configuration Highlighter - Color-codes tasks and connection managers that have properties set by configurations or expressions, making your packages more self-documenting.
  • Expression List - Adds a new "Expression List" window to Visual Studio's "Other Windows" menu; this window displays all expressions defined within the package.
  • Smart Diff - Provides a pre-processor for the built-in VSS or TFS differencing tools, to make it easier to diff two DTSX files' XML content.
All in all, this looks like a major step forward in the SSIS developer's toolbox. The Expression and Configuration Highlighter and Smart Diff features in particular provide functionality I have often wanted, and even if they're not exactly what I would have liked, they're still better than what we had before. Check it out!

Saturday, November 17, 2007

Installer++

So SQL Server 2008 CTP5 is hitting the public distribution channels, and is hopefully coming soon to a virtual machine near you. I've got my Katmai VM rebuilt with CTP5 already, and with a little luck (including my getting over this darned cold) we'll have a bunch of new Katmai SSIS posts here in the days ahead.

But before we can get to SSIS we have to mention the cool new installer. I remember being delighted with the installer in SQL Server 2005 (although admittedly I was comparing it to the BizTalk 2004 installer, and just about anything would be a huge advance over that monster) but the new SQL Server 2008 installer is a big improvement. Check it out:

































Bah! I was planning on having a lot more text to go with the flood of images, but I can't seem to make the Blogger editor do what I want it to do, and I'm sick, and it's late. So for now, please enjoy the pretty pictures and we'll see what we can do with other posts tomorrow...

Thursday, November 15, 2007

SQL Server 2008 CTP5 - I'm Ready - Are You?

It looks like the next CTP release of SQL Server 2008 (CTP5) will be available for download tomorrow:

http://weblogs.sqlteam.com/billg/archive/2007/11/15/SQL-Server-2008-CTP5-on-Friday.aspx

This release is due to have significant improvements over CTP4, both in general and in the context of SSIS, so make sure you get your sandbox environment ready today.

Guess what I'll be doing this weekend?

Wednesday, November 14, 2007

Precompile, Recompile, Decompile

As I mentioned in a post yesterday (or the day before... it's been a long week already) I've been spending a lot of time lately in the Script Task in SSIS. Well, I have the task logic implemented and tested, so now it's time to copy that task into all of the packages that need it.[1] So I now have 20 or so packages with the same script in their OnPreExecute event handler, and everything runs great.

Until you look at the package output. During testing I generally pipe DTEXEC's standard output to a text file and use that as my "to do list", and with these updated packages I have a 1500+ line file filled with this warning, over and over:
Warning: 2007-11-14 13:53:07.67
Code: 0x00000003
Source: Script Task Name Here
Description: Precompiled script failed to load. Attempting to recompile.
For more information, see the Microsoft Knowledge Base article, KB931846 (http://go.microsoft.com/fwlink/?LinkId=81885).
End Warning

Not a bad warning, all things considered, and it even has a URL - cool!

Of course, the information in the KB article in question doesn't apply to my scenario, as I'm on a 32-bit machine and am fully patched with SP2 + hotfixes. Hmmm... Time to do some searching...

Well, I'm not alone: http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=2030858&SiteID=17

And people have been having this problem with Script tasks in packages created from template packages: http://bobp1339.blogspot.com/2007_09_01_archive.html

A-ha!

I copied and pasted each of these script tasks from the package in which I performed the initial development and testing. Not quite a template package in this case, but it's close. So...
For Each Package in AffectedProjects.Packages
Package.Open
Package.EventHandlers.Select
Right-click(ScriptTask)
KeyPress(e)
Ctrl-Tab
Alt-S
Wait(5)
Alt-F4
Ctrl-S
Ctrl-F4
Next

The good news: it works! The warnings are gone!

The bad news: that was MScript - the Matthew Script Language, which performs horribly and does not scale, because it relies on my poor typing skills.

Maybe some time I'll find time to figure out how to automate this in the SSIS .NET API and add it to PacMan. Maybe some time I'll find time to sleep too...

[1] Yes, I know about how custom SSIS components have a much better reuse story, but for now it's more important for me to keep my deployment as simple as possible, but thank you for caring.

Monday, November 12, 2007

Three Lock Box... One Lock Variable...

I've been working in the SSIS Script Task for the past few days, and have been both loving and hating it. I've been loving it because it's been so long since I've written "real" .NET code and hating it because it's VB.NET and not C#. Yes, and also hating it because the developer and debugger experience for writing SSIS code is less than ideal. That too.

Anyway, I've been making great progress when I got this delightful error:

Error: 0xC001405C at <>: A deadlock was detected while trying to lock variables "<>" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out.

WTF, I asked myself.[1] I haven't changed any code since the last time (the last 10 times) I ran this package successfully. What on earth could have changed?

So I spent a little time digging around online and found lots of references to installing SQL Server 2005 SP1 and post-SP1 hotfixes, but since I'm running SP2 plus hotfixes already this didn't seem particularly relevant. So what had changed?

Well, since it wasn't the code, I thought, it was probably the inputs to that code. The Script Task code in question was reading in name/value pairs from an external file[2] and updating the variable that matches the name with the value from the file. So what had I changed that caused this error?

I had added the same variable more than once to the file.

Apparently if you use VariableDispenser.LockOneForWrite more than once in the same Script Task for the same variable, SSIS doesn't handle it particularly well. I haven't really had time to explore the full boundaries of the problem behavior (and probably won't for the next week, based on what my project schedule looks like) but I wanted to take the time to post this online in case someone else ran into the same thing.

And now, back to my regularly scheduled overtime...


[1] It's true. I'm working on integrating Windows Teamware Foundation into my data warehouse.
[2] It's a little more involved than this, but this is the important part. And yes, I know all about SSIS Package Configurations, thank you.

Developer's Crossing

Miller's Crossing is one of my favorite movies, and is in my opinion the brightest gem in the Coen brothers' crown, despite the fact that few people know about it when compared to Fargo or Raising Arizona and the rest. If you've never seen it, and you like complex plots, flawed anti-heroes and a bit of odd humor, and don't mind a little violence (ok, a lot of violence) here and there, you owe it to yourself to watch it.


But what does this have to do with SQL Server and BI?


Complexity, that's what.


Consider this quote from Tom Reagan (played by Gabriel Byrne)
"Nobody knows anybody. Not that well."

Sound familiar? When was the last time you felt like you really knew a lot (or knew "everything" for the really smart readers out there) about SQL Server? I remember when I decided to "specialize in SQL Server." Then I decided to "specialize in SQL Server development." And now I'm on the cusp of specializing in SQL Server BI (which actually seems like an increase in scope, but perhaps that's just me) because the breadth of the product has gotten so large that one brain (ok, my brain) cannot encompass it all.

This all struck home today when someone I know (I won't name any names, but he doesn't have a blog) asked me a question that I thought fell very well into his personal specialization. As this is a person whose technical knowledge I respect greatly, it got me thinking, and in the process of working with him to find the solution, this quote filtered up from the depths of my brain.

And that's it. So as the USB hub said to the thumb drive... "Take your flunky and dongle."

You fought with my father in the BI Wars?

(I totally stole the whole "BI Wars" idea from Ted Malone, but since he doesn't have a blog, I don't feel at all bad about it.)
"IBM to Acquire Cognos to Accelerate Information on Demand Business Initiative"

That's the title of the press release from IBM. (http://www-03.ibm.com/press/us/en/pressrelease/22572.wss) Following Microsoft's acquisition of ProClarity (now very old news, I know), Oracle's purchase of Hyperion and SAP's offer for Business Objects, this is just one more loose end being tied up (or snatched up, as the case may be) in the BI software landscape.

So what does this mean to those of us who make our living working with the Microsoft BI toolset? Well, I certainly don't claim to know (my track record for predicting industry trends is less than stellar) but I sure hope that it means Microsoft will be putting even more emphasis on BI in the months and years to come. If other database vendors start offering comprehensive and integrated BI offerings like Microsoft already does today, then Microsoft is likely to lose core differentiators that make the SQL Server BI stack unique.

And we don't want that to happen, now do we?

Thursday, November 8, 2007

Widening the Divide?

Microsoft has recently announced that it is splitting its massive TechEd US conference into two week-long events, starting in 2008. One week will be focused on Developers, the other focused on IT Professionals. In some ways this is probably going to be a good thing, because the event will be able to provide more content to more attendees without getting even more insanely large than it already is.

But in other ways, this worries me. And I'm not simply worrying about people (like myself) who fall into both categories and who (probably like myself) will not be able to take off two full weeks to attend both conferences. Although that is a concern.

The real worry comes down to the cultural divide that separates software developers from the people who support the software that the developers build. These two groups of people speak different languages, have different priorities and have very little context in common, for all that we are all "computer people." As they say, developers are from Neptune, and IT pros are from Uranus.

So why does this worry me?

Well, whenever people (especially people from different factions) get together and meet face-to-face, and talk without the artificial barriers that so often divide them, they tend to "get it" and to get along. TechEd has always been a great place for me to meet up with my developer friends from around the world AND to meet up with my IT pro friends from around the world as well.

What will we (that's the collective "we" in case that doesn't come across clearly in text) do? How will these two interdependent but distinct communities get everything they need from each other without the "United Nations" function that TechEd has traditionally served?

Of course, there's no answer to this question today. Only time will tell...

You Can't Get There From Here

Yes Yukon![1]

I've had several people ask me about this offline in the last few weeks: Why can't I connect to SQL server 2008 from Visual Studio 2008? My answer, of course, has been "because it doesn't work in the current builds." It doesn't, but I haven't been in a position where I've needed to make it work, so I've been happy connecting to SQL 2008 from Visual Studio 2005.

My friend and colleague Ted Malone[2], however did need to figure this out, and was kind enough to share with me what he found: A pretty table!



So even if there isn't as much green as we might want, it's still good to know exactly where we can get from where. Now when are we going to see that next SQL 2008 CTP, eh?

[1] Say it phonetically. It's funny to me, but then again, I have had an awfully long week.
[2] Ted has a blog? Who knew?

PerformancePoint Server is Live!

Microsoft this morning announced that its performance management server product, PerformancePoint Server 2007 is now generally available. You can download evaluation copies from the Microsoft Download Center (32 bit and 64 bit versions are available) and can get the real deal from the MSDN and TechNet subscriber downloads.

Bad Math Day

I've always been fond of the quote "the lottery is a tax on people who are bad at math." It's sort of like "There are 10 types of people in the world - those who understand binary and those who do not," but without the geeky overtones.

But The Register today has a news story that really drives home someone having a bad math day: http://www.theregister.co.uk/2007/11/08/scratchcard_anarchy/

My favorite quote from the story:

"I phoned Camelot and they fobbed me off with some story that -6 is higher, not lower, than -8, but I'm not having it."


Right...

Wednesday, November 7, 2007

SSIS Myths - Phil May Have Beaten Me To It

Last week I posted a "call for myths" about SSIS. My schedule has been insane since then and I've found little time to follow up on this myself, but today I stumbled across a delightful source of misinformation about SSIS, on Oren Eini's blog: http://ayende.com/Blog/archive/2007/07/15/SSIS-15-Faults.aspx.

Hooray, I thought, here is a pre-built source of "myths" about SSIS that I can use instead of needing to build the entire list myself.

But sadly (for me, not for the SSIS community), Phil Brammer (SQL Server MVP and vocal member of the SSIS community) has already responded to Oren's attack:

http://www.ssistalk.com/2007/07/27/ssis-15-faults-rebuttal/#more-60

I may still use some of Oren's points later on, but for now, you should check it out and see what Phil has to say in response.

Sunday, November 4, 2007

Viewing Data Flow Metadata

This is a pretty simple tip, but I’ve seen a few different questions on the MSDN SSIS Forums lately where this technique was part of the solution, so I’ll post it despite its simplicity.

How do you view the buffer metadata for a given path in your data flow? How do you tell which columns are flowing through the logical data flow pipeline, and what data types and sizes each one uses?

The answer is: very easily!

Simply right-click on any data flow path arrow and select “Edit” from the context menu. Then select the Metadata tab and you will be presented with information that looks something like this:




Please note that you cannot edit the metadata here, as the buffer is defined based on the output properties of the component at the “root” of the current execution tree, but often simply being able to view the metadata is exactly what you need in order to understand the cause of a problem.


Of course, it’s important to understand just what you’re looking at when you view this metadata. What you’re seeing is only those columns in the data flow buffer that are “in scope” for the given data flow path arrow. If you view the metadata for the path immediately before a Derived Column transform, for example, you will not see the columns that it adds to the data flow, but if you view the metadata for the path immediately after, you will. While this may seem obvious and logical, it can also be misleading. The actual memory buffer that the data flow runtime creates for that execution tree contains all columns that exist at any point within the tree. This means that the columns added by the Derived Column transform physically exist in the buffer before the transform, but because they are not available to components upstream from the Derived Column transform they do not appear in the metadata viewer.

In any event, hopefully this tip is helpful and will make your SSIS troubleshooting efforts less painful. Enjoy!

Data Source Annotations

This post is something of a follow-up on the “Annotate like you mean it” best practice I mentioned in my best practices post last week. I always figured that the technique was something of a no-brainer, but when I mentioned it to Greg Low in our SQL Down Under conversation last week he seemed a little surprised (in a good sort of way) and if he hadn’t thought of it[1] odds are there are a few people out in SSIS-land who might not have thought of it either.

In my last post I talked about updating an SSIS package to use Common Table Expressions in the source query to replace a complex set of asynchronous transforms in the package’s data flow. The upside of this change was that the package performed better by several orders of magnitude. The downside is that the source query became more complex.

Now this isn’t inherently a bad thing, but it is undesirable in that it makes the package less self-documenting. In addition, because I tend to base my source queries on expressions[2], it is not always a simple task to figure out exactly what query is being executed. Because of this, and because self-description is an admirable goal for any development artifact, I will generally update my data flow with an annotation on the design surface that shows the query next to the data source component. In this scenario, the data flow ended up looking something like this:


As you can see, it is now much easier to understand the data being extracted than it would be to examine the data source component to find out on which variable its query was based, and then examine the variable to understand the expression on which it, in turn, was based.

The obvious drawback of this approach is that as soon as the annotation is created, it is instantly out of date. There is no way to create a “dynamic annotation” in SSIS[3], so there is no enforced relationship between the annotation and the data source component that it describes. This then puts something of a maintenance demand on the package developers, as they must remember to update the annotation each time the source query is updated.
I personally think that this is an acceptable tradeoff. The maintenance cost itself is low, and the benefits of having a well-documented package are considerable. Also, if you follow the best practice of “Get your metadata right first, not later” then you will likely not need to update the source query (or the annotation) very often.
[1] I have a great deal of respect for Greg’s knowledge and experience, so if there’s something that is new to him, it’s worth blogging about.
[2] Which will likely be a post of its own before too long.
[3] Although having annotations based on expressions would be a cool new feature, wouldn’t it?

I Want My CTE!

In my recent Best Practices post I mentioned that it was a best practice to “do it in the data source.” By this, I meant that when using SSIS to extract data from a relational database you can often get much better performance by completing tasks like sorting, filtering, grouping and aggregating in the source query as opposed to completing these tasks in the SSIS data flow. Now don’t get me wrong – the SSIS data flow performs exceptionally well, but it is not always the right tool for every job.

Here’s a specific case in point. I was recently helping someone troubleshoot a poorly performing SSIS package. The developer in question was pulling a variety of date values (different dates and times that described a business entity) from a single table in the source system and loading them into a time dimension in a data mart. As part of the data flow he was taking the seven different source columns and putting them together into a single column like this:

1) He used the Multicast transform to split the one data flow path into seven data flow paths, one for each date column.
2) He created a set of “DateValue” columns with the Derived Column transform, with each one based on one of the date columns from the source table, and replacing NULL dates with a placeholder date value.
3) He used the Sort transform to remove duplicates from each of the derived columns.
4) He used the Union All transform to bring the seven derived columns together into a single “DateValue” column.
5) He used another Sort transform to remove duplicates from the single data column.
6) He loaded the discrete dates into his dimension table.

It looked something like this:



And it performed about as well as you could imagine.

As you can probably imagine, he was a little frustrated, and was saying all sorts of bad things about SSIS which I won’t repeat here and which, for the most part, weren’t true.
And as you can probably also imagine, my solution for this problem was to perform the required logic not in the data flow, but in the source query. In fact, I said “this looks like a job for… Common Table Expressions!” That’s right – since the source system was using a SQL Server 2005 database, we could quickly and easily perform all of this logic in the SELECT statement used by the data flow, and eliminate 90% of the work being done in the data flow. The source query ended up looking something like this:

WITH FilteredDatesCTE
AS
(
SELECT TimeRaised
,TimeOfFirstEvent
,TimeOfLastEvent
,TimeResolved
,LastTimeStateWasModified
,TimeAdded
,LastModified
FROM SourceTable
WHERE LastModified >= '1/1/2007'
),
UnionedDatesCTE
AS
(
SELECT TimeRaised AS DateTimeString FROM FilteredDatesCTE UNION
SELECT TimeOfFirstEvent FROM FilteredDatesCTE UNION
SELECT TimeOfLastEvent FROM FilteredDatesCTE UNION
SELECT TimeResolved FROM FilteredDatesCTE UNION
SELECT LastTimeStateWasModified FROM FilteredDatesCTE UNION
SELECT TimeAdded FROM FilteredDatesCTE UNION
SELECT LastModified FROM FilteredDatesCTE
)
SELECT DISTINCT ISNULL(DateTimeString, '1/1/9999') AS DateTimeString
FROM UnionedDatesCTE;

Of course, if we were using a different database platform for the source system we could have used derived tables instead of CTEs, but the CTE syntax is cleaner and more elegant, and certainly does the trick.

The updated data flow was much simpler – none of the transforms shown above were required anymore – and it performed superbly, completing in minutes where the original data flow with all of its asynchronous transforms took over a day to complete with the same production data volume.

And as you’ve doubtless noticed as well, this example also ties in nicely with the “Don’t use asynchronous transforms” best practice.

So what’s the moral of the story? Yes, this time I actually do have one! The moral is that in order to be an effective SSIS developer, you also need to understand the capabilities of the platforms on which your packages are based. In this case (and in many, many cases) this means understanding T-SQL or some other SQL dialect. If you don’t have a thorough understanding of SQL, you’re going to be at a significant disadvantage when using SSIS. Part of it comes down to choosing the right tool for each job, part of it comes down to understanding the strengths and weaknesses of each tool.

Saturday, November 3, 2007

SSIS Raw File Viewer

SQL Server Integration Services has its own high-performance file format for persisting data - it's the "raw file." The downside of these files is that they can only be used from within an SSIS data flow - there is no other tool to open them.

Or is there?

Simon Sabin has developed a "Raw File Viewer" and has made it available for a free download - check it out here: http://sqlblogcasts.com/blogs/simons/archive/2007/01/11/SSIS-Rawfile-viewer---now-available.aspx

Yes, I realize that this is really old news as he posted this back in January (and it is now November) but as it has managed to slip under my radar for the last eleven months I figure that I'm not the only one who has missed it...

Thursday, November 1, 2007

Evangelists, Evangelists, Evangelists!

I consider myself an evangelist on SQL Server and BI technologies, but my friend and colleague Lynn Langit has this in her job title. She's a Developer Evangelists with Microsoft, and right now she's on the home page of the MSDN Channel 9 web site, talking about her job.

Check it out: http://channel9.msdn.com/Showpost.aspx?postid=352786

Edit: Damn, that's funny! I blogged this while I was downloading the WMV version of the webcast, before I actually watched it. Now I'm watching it, and laughing my head off.

CozyRoc SSIS Products

Although there is a "3rd party developer ecosystem" for SSIS components, it's not particularly active. I don't know if this is because there's no interest, if people just don't understand the platform well enough, or if the SSIS product group just did such a good job building the core platform that people don't see the need to turn to 3rd party vendors (or perhaps some other reason entirely) but I don't see a ton of add-on components out there.

But CozyRoc has just announced beta versions of a set of interesting-looking control flow and data flow components, including:

  • Package SSIS Connection Manager - Provides access to the current or different package at runtime.
  • Data Flow Source SSIS Data Flow Component - Very fast, non-blocking, in-memory data read from another data flow.
  • Data Flow Destination SSIS Data Flow Component - Exposes data from a data flow for in-memory read in another data flow.
You can check it out here (http://www.cozyroc.com/) - and they're free while in beta, so strike while the iron is hot.

SSIS Best Practices, Part 2

And as promised, here is my personal list of SQL Server Integration Services best practices. This is by no means comprehensive (please see my previous post for links to other best practice resources online) but it's a great starting point to avoiding my mistakes (and the pain that came with them) when you're working with SSIS. Here goes:
  • Get your metadata right first, not later: The SSIS data flow is incredibly dependent on the metadata it is given about the data sources it uses, including column names, data types and sizes. If you change the data source after the data flow is built, it can be difficult (kind of like carrying a car up a hill can be difficult) to correct all of the dependent metadata in the data flow. Also, the SSIS data flow designer will often helpfully offer to clean up the problems introduced by changing the metadata. Sadly, this "cleanup" process can involve removing the mappings between data flow components for the columns that are changed. This can cause the package to fail silently - you have no errors and no warnings but after the package has run you also have no data in those fields.

  • Use template packages whenever possible, if not more often: Odds are, if you have a big SSIS project, all of your packages have the same basic "plumbing" - tasks that perform auditing or notification or cleanup or something. If you define these things in a template package (or a small set of template packages if you have irreconcilable differences between package types) and then create new packages from those templates you can reuse this common logic easily in all new packages you create.

  • Use OLE DB connections unless there is a compelling reason to do otherwise: OLE DB connection managers can be used just about anywhere, and there are some components (such as the Lookup transform and the OLE DB Command transform) that can only use OLE DB connection managers. So unless you want to maintain multiple connection managers for the same database, OLE DB makes a lot of sense. There are also other reasons (such as more flexible deployment options than the SQL Server destination component) but this is enough for me.

  • Only Configure package variables: If all of your package configurations target package variables, then you will have a consistent configuration approach that is self-documenting and resistant to change. You can then use expressions based on these variables to use them anywhere within the package.

  • If it’s external, configure it: Of all of the aspects of SSIS about which I hear people complain, deployment tops the list. There are plenty of deployment-related tools that ship with SSIS, but there is not a lot that you can do to ease the pain related to deployment unless your packages are truly location independent. The design of SSIS goes a long way to making this possible, since access to external resources (file system, database, etc.) is performed (almost) consistently through connection managers, but that does not mean that the package developer can be lazy. If there is any external resource used by your package, you need to drive the values for the connection information (database connection string, file or folder path, URL, whatever) in a package configuration so they can be updated easily in any environment without requiring modification to the packages.

  • One target table per package: This is a tip I picked up from the great book The Microsoft Data Warehouse Toolkit by Joy Mundy of The Kimball Group, and it has served me very well over the years. By following this best practice you can keep your packages simpler and more modular, and much more maintainable.

  • Annotate like you mean it: You've heard of "test first development," right? This is good, but I believe in "comment first development." I've learned over the years that if I can't describe something in English, I'm going to struggle doing it in C# or whatever programming language I'm using, so I tend to go very heavy on the comments in my procedural code. I've carried this practice over into SSIS, and like to have one annotation per task, one annotation per data flow component and any additional annotations that make sense for a given design surface. This may seem like overkill, but think of what you would want someone to do if you were going to open up their packages and try to figure out what they were trying to do. So annotate liberally and you won't be "that guy" - the one everyone swears about when he's not around.

  • Avoid row-based operations (think “sets!”): The SSIS data flow is a great tool for performing set-based operations on huge volumes of data - that's why SSIS performs so well. But there are some data flow transformations that perform row-by-row operations, and although they have their uses, they can easily cause data flow performance to grind to a halt. These transformations include the OLE DB Command transform, the Fuzzy Lookup transform, the Slowly Changing Dimension transform and the tried-and-true Lookup transform when used in non-cached mode. Although there are valid uses for these transforms, they tend to be very few and far between, so if you find yourself thinking about using them, make sure that you've exhausted the alternatives and that you do performance testing early with real data volumes.

  • Avoid asynchronous transforms: In short, any fully-blocking asynchronous data flow transformation (such as Sort and Aggregate) is going to hold the entire set of input rows in memory before it produces any output rows to be consumed by downstream components. This just does not scale for larger (or even "large-ish") data volumes. As with row-based operations, you need to aggressively pursue alternative approaches, and make sure that you're testing early with representative volumes of data. The danger here is that these transforms will work (and possibly even work well) with small number of records, but completely choke and die when you need them to do the heavy lifting.

  • Really know your data – really! If there is one lesson I've learned (and learned again and again - see my previous blog post about real world experience and the value of pain in learning ;-) it is that source systems never behave the way you expect them to and behave as documented even less frequently. Question everything, and then test to validate the answers you retrieve. You need to understand not only the static nature of the data - what is stored where - but also the dynamic nature of the data - how it changes when it changes, and what processes initiate those changes, and when, and how, and why. Odds are you will never understand a complex source system well enough, so make sure you are very friendly (may I recommend including a line item for chocolate and/or alcohol in your project budget?) with the business domain experts for the systems from which you will be extracting data. Really.

  • Do it in the data source: Relational databases have been around forever (although they did not write the very first song - I think that was Barry Manilow) and have incredibly sophisticated capabilities work efficiently with huge volumes of data. So why would you consider sorting, aggregating, merging or performing other expensive operations in your data flow when you could do it in the data source as part of your select statement? It is almost always significantly faster to perform these operations in the data source, if your data source is a relational database. And if you are pulling data from sources like flat files which do not provide any such capabilities there are still occasions when it is faster to load the data into SQL Server and sort, aggregate and join your data there before pulling it back into SSIS. Please do not think that SSIS data flow doesn't perform well - it has amazing performance when used properly - but also don't think that it is the right tool for every job. Remember - Microsoft, Oracle and the rest of the database vendors have invested millions of man years and billions of dollars[1] in tuning their databases. Why not use that investment when you can?

  • Don’t use Data Sources: No, I don't mean data source components. I mean the .ds files that you can add to your SSIS projects in Visual Studio in the "Data Sources" node that is there in every SSIS project you create. Remember that Data Sources are not a feature of SSIS - they are a feature of Visual Studio, and this is a significant difference. Instead, use package configurations to store the connection string for the connection managers in your packages. This will be the best road forward for a smooth deployment story, whereas using Data Sources is a dead-end road. To nowhere.

  • Treat your packages like code: Just as relational databases are mature and well-understood, so is the value of using a repeatable process and tools like source code control and issue tracking software to manage the software development lifecycle for software development projects. All of these tools, processes and lessons apply to SSIS development as well! This may sound like an obvious point, but with DTS it was very difficult to "do things right" and many SSIS developers are bringing with them the bad habits that DTS taught and reinforced (yes, often through pain) over the years. But now we're using Visual Studio for SSIS development and have many of the same capabilities to do things right as we do when working with C# or C++ or Visual Basic. Some of the details may be different, but all of the principles apply.
In the days and weeks ahead I hope to have some more in-depth posts that touch on specific best practices, but this will have to do for now. It's late, and I have work to do...

[1] My wife has told me a million times never to exaggerate, but I never listen.

SSIS Best Practices, Part 1

One of the topics on which I've presented quite a bit lately has been that of SQL Server Integration Services best practices. The body of knowledge on how to best use SSIS is small compared to more mature development technologies, but there is a growing number of resources out there related to SSIS best practices.

And here it is. ;-)

In any event, here is a list of links to those online resources I have found related to SSIS best practices:

I'd also like to point out that there is a decent amount of overlap and a decent amount of contradiction between the various best practices referenced above. Not everyone agrees on what's best, and to a large extent, "best" is determined as much by the context in which you're using SSIS as it is by SSIS itself. Read all of these resources (and everything else, for that matter) with an open mind and apply their guidance with a healthy dose of pragmatism.

And although this doesn't fall into the same "lists of things to do and to not do" category as the links above, anyone who is interested in doing SSIS correctly needs to check out Microsoft's Project REAL (http://www.microsoft.com/sql/solutions/bi/projectreal.mspx) as well.

Of course, I have no intention of stopping here. (The whole "Part 1" thing in the title probably clued you in on this already, right?) At some point - hopefully today, but possibly as late as this weekend - I will be posting my own personal list of best practices, but I wanted to give credit where credit is due before doing so.

SSIS Performance Tuning Techniques

This article is neither new nor my own, but I've shown it to a few people lately and despite their having worked with SSIS for months, none of them had seen it.

http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx

This is a white paper that describes how the SSIS run time engine and data flow engine work, and how to take advantage of that knowledge to build better performing SSIS packages. It has excellent coverage of execution trees and memory buffers, and how they are used "under the hoods" in your data flow.

If you haven't read it before, please check it out today and tell a friend. It's required reading for anyone building SSIS packages in the real world.

Wednesday, October 31, 2007

Another Profound Quote

I'm participating in a day-long online meeting via phone, and someone just came out with this gem:
"No one should watch sausage or software being made."

It works on so many levels. It's wrong on so many levels. I love it.

PacMan is Loose on CodePlex!

PacMan - The SQL Server Integration Services Package Manager is a utility designed to permit batch operations on arbitrary sets of SSIS packages. Users can select a single package, a Visual Studio project or solution or a file system folder tree and then validate or update all selected packages in one operation.



And you can find it here: http://www.codeplex.com/pacman

PacMan is a utility that I developed primarily for my own use, and secondarily for use by members of my development team. Although I had always intended to share the PacMan utility with the SSIS community eventually, I kept pushing it back further and further since it wasn't "ready."

But my speaking schedule this month has convinced me that either it's never going to be ready, or else it's already ready. Maybe both. At various presentations this month in Stockholm, Sweden and here at home in Central New York I demonstrated PacMan to illustrate some of the things you can accomplish by using the .NET object model included with SSIS, and the response was an overwhelming "I want that!" from the seminar participants. I explained how this was a personal dev utility that wasn't really ready for prime time, and the interest didn't wane at all with that warning.

So here we are. I've done minimal cleanup, and the code isn't perfect (and there are very few comments, which is something that I would never allow to happen on a "real" project) but I could not find anything similar online, and I thought the best thing to do would be to share the code today instead of waiting to polish it before sharing it.

This is the first CodePlex project (or open source project in general) that I have started, so I honestly do not know how I am going to handle other project participants. I'll the first public alpha release is already available for download so everyone should be able to download the code to use (and tweak) on their own, but I don't currently know what I'll do when people want to contribute directly to the source code. I guess we'll find out when people ask.

Multi-Monitor Goodness

Do you use multiple monitors?

Do they have different resolutions?

Have you ever wanted to display different wallpaper images on each monitor, or to have the same image displayed appropriately (without cropping or stretching and the like) on both monitors?

Well, welcome to DisplayFusion (http://www.binaryfortress.com/displayfusion/)!

DisplayFusion is a free software utility that lets you quickly and easily manage multiple monitors. It does more than just give you control over the desktop wallpaper (although this is all I'm using it for today) - there's also a nice list of features related to managing window position across monitors with global hotkey support.

And did I mention it's free?

So if you answered yes to any of the questions above, you should download DisplayFusion and give it a try.

SSIS Myths - Call for Entries

I mentioned this to Greg Low during our SQL Down Under chat this morning, so I should probably mention it here in writing as well. I'm planning to start a series of "SSIS Myths" entries on this blog. What this will entail (based on what I see today, in any event) will be presenting a misconception about SSIS, and then digging into the details behind it and presenting the truth (or truths) behind the myth.

So although I have a few of these already in mind, I'd like to put out an open call for entries. If you have a myth in mind, if you've seen or heard statements about SSIS that you don't know if they're true, please let me know. Post a comment or send me an email[1] and I'll add it to my queue. Hopefully I'll have the first myth debunked by next week and will keep a steady stream of them coming your way in the weeks that follow, but in order to keep them coming I will need your help.

So let your voice be heard!


[1] Matthew MCT at Gmail dot calm - you can figure out what to do with it.