Friday, March 28, 2008

I Came, I Saw, I Coded

Starting one week from today, April 5th and 6th 2008 is the 9th semi-annual Code Camp at the Microsoft New England District office in Waltham, MA. A code camp is something like a mini-conference, held on a weekend, and it's free.

That's right - it's free.

codecamp9

And why am I blogging about an event that's around 300 miles away from me? Because I'm going to be presenting three sessions on SQL Server Integration Services, that's why! I am scheduled to deliver these three exciting SSIS sessions:

Jumping Into the ETL Deep End with SQL Server Integration Services

If you’re a developer, you often run into situations where you need to get data from point A to point B, and often need to massage it along the way. Odds are you’ve seen the SSIS tools, but like many DBAs and developers you weren’t quite sure where to begin. If this sounds familiar, then this is the session for you. In this session you’ll learn what SSIS can do, how to make it do what it does, and what tools to use to make it work its magic. We’ll look at package development and deployment, and when the session is done you’ll be ready to build and deploy SSIS packages with confidence.

SQL Server Integration Services Development Best Practices

Are you tired of feeling like you’re making the same mistakes over and over again? Would you like to have a roadmap that outlines the pitfalls you’re likely to encounter when building ETL solutions with SSIS? Then this session is for you! In this session you’ll learn how to get the most of the SSIS tools and platform through a set of SSIS development best practices from a battle-scarred database and BI consultant who has survived the rough projects and lived to tell the tale.

SQL Server Integration Services Deployment Best Practices

You've built a set of SSIS packages to populate your data warehouse, and everything works great on your development environment - but what next? SSIS has a great set of tools, but when it comes to preparing for and executing a seamless deployment from development into test and finally into production, it's not always obvious what tools to use or how to best use them. In this session you'll learn how to prepare for and execute a seamless deployment of complex ETL solutions through a repeatable, testable process for both SQL Server 2005 and SQL Server 2008. If you're tired of saying "but it worked on my machine!" then this is one session you can't afford to miss.

And not only am I scheduled to present these three sessions, there are over 60 different sessions on topics ranging from SQL Server to Windows Communication Foundation to SilverLight to Visual Studio to BizTalk. It looks like this will be a fantastic event!

Fantastic for the attendees, anyway. ;-)

Did you notice how I was being especially careful to say "I'm scheduled to present three sessions" instead of "I'm presenting three sessions?" There's a reason for that. The "Jumping Into the ETL Deep End with SQL Server Integration Services" session is scheduled for 10:35 Saturday morning, and there is a decent chance that I will not be able to reach Waltham in time. I'm going to be out on the west coast all next week, and my flight home is scheduled to arrive around midnight Friday night. Since it's a four hour drive to Waltham under ideal circumstances, there's a pretty good chance that I will actually deliver two SSIS sessions and not all three.

But don't let this stop you! There are still around 75 attendee slots still open (out of the available 500) so sign up now if you haven't already. And I'll see you there!

Update: The fantastic organizers of the Code Camp (thanks, Chris and Chris!) have reworked the schedule to accommodate my travel restrictions, so all three sessions should go as planned. Here are the updated times for my three sessions:

Sat 15:25-16:40 - Jumping Into the ETL Deep End with SQL Server Integration Services

Sat 16:50-18:05 - SQL Server Integration Services Development Best Practices

Sun 12:30-13:45 - SQL Server Integration Services Deployment Best Practices

Saturday, March 8, 2008

Heroes, Heroes Everywhere

Last week in Los Angeles was just the kick-off of Microsoft"s worldwide "Heroes Happen Here" launch wave. Not only is Microsoft running a lot of major events, they have organized a series of Community Launch Events organized and run by local users groups in cities all round the world.

And next week my local CNY Developers users group is holding its community launch event next Wednesday, March 12th. I will be presenting the SQL Server 2008 content and group leader Andy Beaulieu (who has just returned from the MIX 08 conference in Las Vegas, so he should have lots of news and excitement to share) will be presenting content on Visual Studio 2008.

If you're in the Syracuse area, please plan on attending this event. Not only will you get the best SQL Server 2008 content this side of Redmond, but I will also be giving away a surprise gift to the person who asks the best[1] SQL Server 2008 question.

Also, there is a rumor that our Windows Server 2008 presenter will not be able to make it; if that is the case we'll have even more time to cover the really good stuff: SQL Server 2008.

I'll see you there!

 

[1] You want the gift, at least is you're a geek. But the judging will of which question is best be strictly performed by me. If you can stump me on a SQL 2008 BI question, you'll probably win, but the competition may well be fierce, so be prepared!

Wednesday, March 5, 2008

Express Yourself!

If you've read my blog very often, you know how much I love Expressions in SQL Server Integration Services. Expressions are just about the coolest thing since sliced data[1] and deliver incredible power to SSIS package developers.

But they're not always particularly intuitive. And although they are very powerful, they are not something with which developers interact directly and explicitly, or at least they don't often think of it that way. They think that they're working with precedence constraints or the Derived Column transformation or making task properties dynamic - Expressions just happen to be the tool that makes all of these amazing things possible. This can make it difficult to find help when it's needed, because people aren't searching for "SSIS Expression Syntax" - they're searching on something related to their task at hand.[2]

Fortunately, there are some good resources available online. The first five links below are actually from SQL Server books Online (BOL), so everyone who has SSIS has this information already, but for some reason no one seems to be able to locate them.

  • Integration Services Expression Reference: This is the top-level topic, the entry point into the expression documentation in BOL.

  • Integration Services Expression Concepts: This topic describes the main concepts involved with SSIS expressions, including the expression syntax, how to handle data type conversions and the supported literals and identifiers in the expression language.

  • Integration Services Expression Operators: This topic describes each operator supported by the SSIS expression syntax.

  • Integration Services Expression Functions: This topic describes each function supported by the SSIS expression syntax. With the exception of the conditional operator this is pretty dry reading, but good stuff to know nonetheless.

  • Advanced Integration Services Expressions: This is where the BOL content starts to get really good, listing a set of real-world expressions and explaining how they're constructed. Unfortunately, it doesn't quite go far enough.

  • SQLIS.com Wiki: This excellent community resource picks up where the Advanced Integration Services Expressions page leaves off. It includes a variety of real-world expressions including many more common scenarios and many more samples than are included in BOL. It's delightful, and the information it contains should be part of every SSIS developer's toolbox.

  • So take a few minutes to see what's out there. The more you know about SSIS expressions the more you're going to love them, and the more powerful you will be when working with SSIS.

    [1] Whatever that means...

    [2] Which, of course, is why I included the names of those other tasks above - pretty clever, eh? Maybe we'll get more "Windows Live Search" hits that way, eh?

    Which Package Was That?

    One of my biggest complaints about the DTEXEC utility included with SSIS is that its output does not contain the name of the package being executed to produce the output. Obviously, if you're only running a single package this is not a problem, but many real world SSIS scenarios (including 99.99% of mine) involve the execution of master packages which in turn execute other packages by using the Execute Package task.

    To make matters worse, if you use template packages (which is generally the case, because there is not much of a reusability story in SSIS) then you probably have the same set of tasks with the same names in each package. So when you get this error message output[1] from DTEXEC...

    Error: 2008-03-03 19:44:42.21
       Code: 0xC0024104
       Source: Script Task - Set Package Variables 
       Description: The Execute method on the task returned error code 0x80004003 (Attempted to read or write protected memory. This is often an indication that other memory is corrupt.). The Execute method must succeed, and indicate the result using an "out" parameter.
    End Error

    ...how are you supposed to figure out which package has failed?

    The short answer is "with great difficulty."

    But with a little basic work inside your packages (and you're probably doing something similar to this already, so this could well be work that you have already performed) the slightly-longer answer could be "really easily."

    So what do you need to do to make this diagnostic task simple? You need to have implemented some sort of logging in your packages. All of my production packages have a logging pattern that looks something like this:

    1. An Execute SQL task that runs first, creating a new record in an audit table in a SQL Server database, and returning the identity value for the new record, which is then used later on in the package execution. This audit table contains a "StartTime" column which tracks when the package execution started.
    2. Do what ever "real work" the package does with whatever tasks are needed.
    3. An Execute SQL task that runs last, and which updates the record in the audit table that was created earlier, including setting a value for the "EndTime" column in the audit table.

    So do you have something like this in place already? If so, read on! If not, you may as well stop reading now, and start thinking about implementing something like this in your packages.

    Because once you have this in place, all you need to do is run a simple SELECT statement like this:

    SELECT *
      FROM [AuditTable]
    WHERE [StartTime] < '2008-03-03 19:44:42.21'
    ORDER BY [StartTime] DESC

    This will give you a list of packages that have executed up to and including the package that logged the error, with the offending package at the top. You could also add a TOP 1 to your SELECT statement if you only wanted to see the one package that logged the error, but I have often found it useful to see what packages have run before the package with the problem, especially when there is logic in the master package that would dynamically run (or not run) the worker packages.

    This is obviously not really high tech, but it's a useful little trick that will hopefully save you some time. Enjoy!

     

    [1] Or any other error message - this is just the one that was leaping out at me today. I've been meaning to post on this little tip for months, and this just happened to be the error message that broke the blogger's back, as it were...

    What's Best?

    I was flying through Cincinnati yesterday and while waiting for my connecting flight I overheard (I wasn't trying to eavesdrop, but the guy was standing right next to me and talking loudly and clearly) a phone conversation about various aspects of IT consulting. During the conversation the person speaking next to me came out with this great quote:

    "There's no such thing as best practices - only best practitioners."

    Ok, I don't know if I really agree with this 100% (although I did smile and say "that's right!" to the man on the phone) but it is certainly an interesting idea that raises a whole set of interesting questions related to knowing what's "best" for a project.

    Obviously, best practices exist, but they are generally just rules of thumb. They're general guidelines to follow in common situations, but they're not a panacea. In order for them to be applied to greatest effect, the person following the best practices needs to understand not just what to do, but also why.

    In order to really get the best, you need not only best practices, but you also need the right people to implement them - the best practitioners, as it were. This is because context is practically everything when it comes to determining meaning, and in order to understand the context of a best practice recommendation, the practitioner needs to first understand two primary things:

    1) The business context of the project - without understanding the real problems that need to be solved, no one can select an ideal solution.

    2) The technology platform on which the solution will be implemented.

    These two factors are vital when applying documented best practices because the best practices are generally presented as a set of prescriptive guidance - do this, don't do this, also do this - without a great deal of explanation. There's rarely[1] information about why a given step/task/configuration is considered "best." Because of this, the practitioner needs a solid understanding of how the technology platform works - this will let him interpret the best practice guidance and decide whether a given "best practice" recommendation applies to his current project context.

    So what's best?

    The answer of course is "it depends." The answer always depends on the context in which the question is asked.

    But when you're looking for the best solution to your vexing problems, you need not only best practices, but you also need the right people to implement them. I've seen too many companies over the years who try to save money by only hiring "junior" personnel, but who then expect these inexperienced folks to deliver top-quality software on aggressive schedules. Although sometimes this can work, it's never repeatable and generally only succeeds through great personal sacrifice on the part of the team members.

    The next time you're starting a project, think about the skills you're going to need, and realize that sometimes there's no substitute for the experience and deep knowledge of a "best practitioner."[2]

     

    [1] Although this is beginning to change as technologies mature - the patterns & practices group at Microsoft does a great job here.

    [2] No, I'm not trying to advertise or promote myself or anyone else. I'm booked solid (and then some) for as far as I can see, and don't have any ulterior motives behind the post.

    Tuesday, March 4, 2008

    New to SSIS? Start Here!

    I must admit that I've been known to say some negative things about the SQL Server Integration Services documentation from time to time. But one part of the SSIS content in SQL Server Books Online is the "Creating a Simple ETL Package" tutorial. It is probably the single most valuable beginner's resource for working with SSIS, and yet so many people who need this information have never heard of it.

    So what is it?

    This five-part hands on tutorial provides an overview of the tools in Visual Studio for SSIS and the capabilities of the SSIS platform. It uses the AdventureWorks sample databases and a set of text files that are included with the Books Online install, and walks users through five lessons, each of which builds on the ones before:
    1. Creating the Project and Basic Package: Build an SSIS package that extracts data from a text file and loads it into a SQL Server table, complete with explanations not only of what to do but why you're doing it.
    2. Adding Looping: Update the package to loop through all of the flat files in a folder and load their data into the table.
    3. Adding Package Configurations: Add a package configuration (a very important technique to learn!) so that the path to the folder where the text files are stored can be updated outside of the package.
    4. Adding Logging: Update the package to log execution information to a text file.
    5. Adding Error Flow Redirection: Update the package to redirect "bad data" to a text file for later analysis.

    Sounds great, doesn't it? While these techniques are covering only a small portion of what SSIS can do, they do cover a representative subset and give users much-needed familiarity and comfort with the SSIS tools in Visual Studio.

    And best of all, while the text files used in the lessons are included in your local SQL Server installation, you can find the tutorial instructions on MSDN here: http://msdn2.microsoft.com/en-us/library/ms169917.aspx

    Take an hour or two and go through these tutorials - you won't regret it!

    Sunday, March 2, 2008

    Jamie Thompson on the Data Profiling Task

    SSIS Junkie Jamie Thompson has once more made the rest of the SSIS community look lazy by posting a multi-part[1] series of blog posts on the new Data Profiling Task in the February CTP of SQL Server 2008. If you're at all interested in data quality, definitely check it out:

    http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-1-introduction.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-2-column-length-distribution.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-3-column-null-ratio.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-4-column-pattern.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-5-column-statistics.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-6-column-value-distribution.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx

    http://blogs.conchango.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx[3]

    I promise I was going to do this really soon[2] but since Jamie has done such an excellent job, I guess now I won't need to be bothered. Thanks, Jamie!

    [1] This was originally "five-part" but Jamie has since added two more. I don't know if he's done yet, so I'll leave it as this.
    [2] Boy, that's weird - Blogger.com doesn't support the new "blatant lie" HTML tag ;-)
    [3] Part 8 added on March 4