Friday, June 27, 2008

Looking for a Date - What's in a Name?

I've blogged about using SSIS to create text files with dynamic file names before, but there's another scenario that keeps coming up time and again. It looks something like this:

"I need to export data from a database into a text file. I have this working, but I need the text file to have a name that's based on the current date, like MyOutputFilePrefix_YYYY-MM-DD-HH-MM-SS.txt. How do I do this?"

It should come as no surprise that the short answer is "use expressions." Expressions are (in my humble opinion, of course) the single most powerful feature in the SSIS developer's toolbox, and are the primary mechanism for adding dynamic functionality to SSIS packages.

Bur of course "use expressions" doesn't give you all the information you need, so here is a quick walkthrough that shows all of the steps involved.

  1. Start off by creating or locating a flat file that has the format that you need. This will give you a head-start for configuring your Flat File Connection Manager later on. Since I'm working on a simple demo, my text file is appropriately simple:

    01
  2. Next, add a Flat File Connection Manager to your SSIS package, and update its properties to reference the sample flat file you created above:

    02
  3. Make sure that the columns you defined in the file are being parsed correctly:

    03
  4. And update the properties (generally the data type) of the columns as needed. In this scenario, we're setting the data type of the ID column to be a four-byte signed integer:

    04
  5. Next, add an OLE DB Connection Manager (or whatever makes sense for your source database) to your package, configure it as necessary[1], and add a Data Flow task to the package as well:

    05
  6. Inside the Data Flow task, add an OLE DB Source component (or whatever makes sense for your source database) and configure it with your source query:

    06
  7. Next, add a Flat File Destination component to the data flow and connect it to the source component:

    07
  8. Configure the Flat File Destination to use the Flat File Connection Manager and ensure that its column mappings are configured correctly:

    08a
    08b
  9. Run the package to make sure that it works:

    09
  10. And test it by checking the contents of the target text file:

    10
  11. And now we're ready to start. That's right, the last 10 steps were included to show how I got to the starting point for adding the dynamic functionality that is the whole point of the post. What we've done above is set up a simple package and data flow that exports data to a text file, which is (I assume) where the people who keep asking the question at the top of the post are getting stuck. And since we walked through it step-by-step together, everyone can see that there is no prestidigitation involved.
    So let's start off by creating three different package variables, Foldername, FileName and FilePath:

    11
  12. Set the value property of the FolderName variable to the path of the folder where you want your files to be created:

    12
  13. Set the EvaluateAsExpression property of the FileName variable to True and set the Expression property to this expression:
  14. "MyOutputFilePrefix_" + (DT_WSTR,4)YEAR(@[System::StartTime]) + "-"
        + RIGHT("0" + (DT_WSTR,2)MONTH(@[System::StartTime]), 2) + "-"
        + RIGHT("0" + (DT_WSTR,2)DAY( @[System::StartTime]), 2) + "-"
        + RIGHT("0" + (DT_WSTR,2)DATEPART("hh", @[System::StartTime]), 2) + "-"
        + RIGHT("0" + (DT_WSTR,2)DATEPART("mi", @[System::StartTime]), 2) + "-"
        + RIGHT("0" + (DT_WSTR,2)DATEPART("ss", @[System::StartTime]), 2) + ".txt"

    This will evaluate to a filename like MyOutputFilePrefix_YYYY-MM-DD-HH-MM-SS.txt, as requested above. [2]

    13a

    13b 

  15. Set the EvaluateAsExpression property of the FilePath variable to True and set the Expression property to this expression:

    @[User::FolderName] +  @[User::FileName]

    14 
  16. At this point we have a variable, FilePath, that will always evaluate to a fully qualified file path that is dynamically updated to reflect the time the package started executing. This is exactly what we need, right? So the one thing that remains to be done is to use this value to create the files being exported by the data flow. And to do this we just need to update the Flat File Connection Manager to base its ConnectionString property on this expression:

    @[User::FilePath]

    15a
    15b
  17. Then, run the package a few times, and take a look in the output folder:

    16 

Pretty simple, eh? Once we had the base package (the starting point for the whole exercise) set up, it was simply a matter of setting a few property expressions to make the whole thing work as needed.

It's also worth noting that the technique where there are three variables to contain the file path is completely optional as well - you could just as easily build a single (although admittedly more complex) expression for the ConnectionString property, and skip the variables entirely. I personally prefer having the variables to contain the intermediate values that go into the file path because in most situations there are opportunities for reuse of these intermediate values, and having them stored in variables make that reuse very simple.

And of course, you can easily update the FileName expression as needed to get the file name you require. The SQLIS.com wiki has a page dedicated to date expressions, and I will generally start there whenever I need to write a date expression myself. In fact, you'll probably notice that the expression I used for the FileName variable was blatantly stolen (and then modified just a little) from one of the examples on the SQLIS wiki site.

So once again we've witnessed the power of expressions in SSIS. By setting just a few properties to evaluate as expressions, we've quickly and easily added dynamic functionality to a package. What could be better?

 

[1] Since the purpose of this post is to focus on the dynamic file name, I'm being deliberately vague here. This is stuff that you should know already, so if you don't, check here: New to SSIS- Start Here!

[2] Although this example is using the @[System::StartTime] variable, you can use any date variable (or function, like GETDATE) for your expressions - use the one that makes the most sense for you.

Wednesday, June 25, 2008

BIDS Helper and SSIS Deployment

Back in November I posted about the 1.2 release of the amazing BIDS Helper project on CodePlex and the new SSIS-related features it included. Well, the folks who contribute to the BIDS Helper project have been hard at work getting ready for their 1.3 release, and today they checked into TFS the first set of features designed to provide real-world deployment capabilities from within the Visual Studio (BIDS) development environment.

Now, not only can you create the standard deployment manifest[1] you can also deploy automatically and (here's the exciting part!) have BIDS Helper automatically create a batch file that calls DTUTIL each time you build your project.

How cool is that!

The reason I'm so excited about low-tech batch files is that they are incredibly simple to integrate into any deployment process, and they're completely hands-free. They may not be sexy, but they're close to perfect nonetheless.

Take a look here for more information: http://www.codeplex.com/bidshelper/Wiki/View.aspx?title=Deploy SSIS Packages

The bad news is that version 1.3 is not yet ready for release, so if you want to start taking advantage of this functionality you will need to download the BIDS Helper source code and compile it yourself. Which I am now about to do...

[1] Which seems like a good start, but quickly becomes a dead-end, because it is dependent on a user clicking and typing every time it is used, which makes it very undesirable in a real-world deployment scenario.

SSIS and SQL Server Agent - Choosing the Right Job Step Type

My "SSIS and SQL Server Agent" post is the most frequently accessed post on my blog, or at least it has been for the two months that I've actually been tracking site usage. But some recent posts on the SSIS Forums on MSDN have made me realize that it was probably long overdue for a follow-up focused choosing the correct job step type.

Microsoft has provided two different SQL Server Agent job step  types that you can use to execute SQL Server Integration Services Packages:

  1. The "SQL Server Integration Services Package" job step type
  2. The "Operating system (CmdExec)" job step type.

Now which one leaps out at you as being ideal for running SQL Server Integration Services packages?

Unfortunately, despite the name and the oh-so-friendly GUI through which you can specify all sorts of options, the SQL Server Integration Services Package job step type is not the one you want to use. There are quite a few reasons I believe this (and it's worth pointing out that while there are people who agree with me, there are some very talented SSIS people out there who do not) including:

  • Consistency: You can create a batch file that calls DTEXEC, test that execution outside of SQL Server Agent, and then copy and paste the DTEXEC command line into the SQL Server Agent job. This minimizes the number of environmental variables[1] that change between testing package execution outside of SQL Server Agent and testing package execution within a SQL Server Agent job.
  • Discoverability: I realize that this won't apply to everyone, but I personally find it much simpler to look at command line switches and options than to look at nine different tabs in a tabbed window. Looking at the command line I know that I'm seeing everything. Looking through the tabs, I'm pretty sure I'm going to miss something.
  • Troubleshooting and Error Output: This is the big one. When you run a package through the SQL Server Integration Services Package job step type, you're limited to the error logging that is already configured within the package itself, and this isn't always sufficient to diagnose and resolve problems with the job. But when you execute a package through DTEXEX and the the CmdExec job step type, you have more options that make tracking down and resolving problems much simpler.

So how do you do it? And do you really want to, if you're not one of those weird people who would rather have a command line than a GUI?

Well, the answer to the second question here is yes. Yes you do. Because even though you're using the CmdExec job step type, you still have a GUI to use. And here's how.

  1. Double-click on the package you want to run.[2] This will open up DTEXECUI, the Execute Package Utility.
  2. In DTEXECUI, specify all of the options that you need by using its nice friendly GUI.
  3. On the Command Line tab in DTEXECUI, highlight and copy the Command line text - these are the switches and options that need to be specified for DTEXEC to duplicate the options you selected in the GUI:

    Use DTEXECUI to build your command line
  4. In you SQL Server Agent job, add a new job step, and specify the job step type as CmdExec. In the Command text box, type DTEXEC and a space, and then paste the command line text that you copied from DTEXECUI earlier[3]:

    Paste the command line text into your job step
  5. On the advanced tab of the New Job Step dialog box, enter a file path and name in the Output File text box. (This is the big thing.) With this specified, the output from DTEXEC, which is exactly what you want when troubleshooting problems, will be written to a text file. You can also specify whether the file should be overwritten or appended to with each execution.

    Specify an output file for the job step

Now, when you execute your SQL Server Agent job, all DTEXEC output will be written to this text file. And if something goes wrong, you now have a single, easy to use location for tracking down and resolving the problems.

One of the most common things that the MVPs and other people who help out on the SSIS forums request is "the complete set of error messages and warnings from the package execution."[4] The reason we ask for this is that it's pretty much impossible to remotely troubleshoot someone else's problems without complete and explicit output. And the greatest thing about using the CmdExec job step type is that all of this information is now in one place. Copy and paste the errors and warnings, and help is (hopefully) on your way.

 

[1] Not Windows environment variables - I'm talking here about "things that are important to your deployment environment."

[2] If your package is stored in SQL Server and not on the file system, you can still use this basic technique, but you will have to browse to and select the package from within DTEXECUI, as opposed to double-clicking on the package on the file system.

[3] You can also specify the fully qualified path to DTEXEC.EXE - this is helpful if DTEXEC is not on the path, or if you're on a 64-bit server and want to explicitly select the 32- or 64-bit version of DTEXEC for this package.

[4] And we're not the only ones who ask for this. Your in-house support folks will probably ask for the same thing if you have a support team who works with SSIS.

Monday, June 23, 2008

File Name Expressions

A recent post on the SSIS Forums on MSDN reminded me how anti-intuitive it can sometimes be to work with SSIS expressions. For example, it's quite common (such as when using the File enumerator for the Foreach Loop container) to have a package variable that contains the fully-qualified file name of a file. But what if you need only a portion of the file name?

Well, the obvious answer (as I mentioned in an earlier post on expressions) is to look at the SQLIS.com Wiki for expression examples, because there are some good examples up there, including these first two. So here are a few more, each of which includes sample output for this sample input:

C:\Projects2008\RC0_SSIS_Test\RC0_SSIS_Test\FileExpressions.dtsx

Get the filename from a full file path:

RIGHT( @[User::FileName], FINDSTRING( REVERSE( @[User::FileName] ), "\\", 1 ) - 1 )

Output: FileExpressions.dtsx

Get the directory from a full file path:

SUBSTRING( @[User::FileName], 1, LEN( @[User::FileName] ) - FINDSTRING( REVERSE( @[User::FileName] ), "\\", 1 ) )

Output: C:\Projects2008\RC0_SSIS_Test\RC0_SSIS_Test

But what if your needs are covered there? Well, here are a few more examples:

Get the file extension from a full file path or file name:

RIGHT( @[User::FileName], FINDSTRING( REVERSE( @[User::FileName] ), ".", 1 ) - 1 )

Output: dtsx

Get the file name minus the extension from a full file path:

SUBSTRING (@[User::FileName], LEN( @[User::FileName] ) - FINDSTRING( REVERSE( @[User::FileName] ), "\\", 1) + 2,  LEN (RIGHT( @[User::FileName], FINDSTRING( REVERSE( @[User::FileName] ), "\\", 1 ) - 1 ) ) - FINDSTRING( REVERSE( @[User::FileName] ), ".", 1 )  )

Output: FileExpressions

Expressions are probably the single most powerful tool in the SSIS developer's toolset, so the more you use them the more you're likely to love them. But sometimes having a few good examples is the best way to get over the initial learning curve.

Friday, June 20, 2008

BI for the Virtual Guy

Yeah, I know I'm stretching for the title of this post, but my allergies are killing me today and this is really the best I can do at the moment.

So what's this "Virtual" bit all about. It's about the first-ever SSWUG Virtual Business Intelligence Conference, coming this September to a computer near you. I'm in the process of helping to organize the conference and I've been having a blast. We have quite a few great BI speakers lined up and are in the process of finalizing the lists of speakers and sessions.[1]

But why do I mention this today?

SSWUG is about to kick off its second SQL Server Virtual Conference next week - it's being held from June 24 to June 26. And I'm excited that I get to attend. The reason I'm excited about attending is the very cool format of the conference - check this out:

http://www.vconferenceonline.com/sswug/demo.asp

The virtual conference format pulls together the best of "real" in-person conferences, along with the best of an online learning experience. The video on the page above shows footage from the first SQL Server Virtual Conference[2] and although the video has catchier music than you're likely to get during the actual sessions, it gives you a great taste of what the virtual conference format has to offer, such as :

  • High-definition video
  • High-quality audio
  • Real-life views of the speaker, the slides and the demos
  • Everything delivered online - no need to leave your desk

And there are some things that don't come through in the sample video, but you also get:

  • Interactive online chats and Q&A with the speakers
  • Downloadable PowerPoint slide decks
  • Downloadable demo files

But what is most exciting to me is that on top of all of these things, the virtual conference is flexible. The conference is three days long, and there are three sessions going on at any given time. At a traditional conference you would need to decide which session to attend, and the other two would be unavailable. Here, each day's content is repeated three times throughout the day, so if you don't catch a session the first time around, you can go back and view it later, so you have the flexibility you normally only get with online content and the interactivity (remember the Q&A with the speakers?) that you normally only get with an in-person conference. And to make things even better, if none of the primary times work for you (or if you get called into emergency meetings, as is often the case in IT) you also have the option to view any of the conference sessions on demand for the two weeks following the conference.

When I was first contacted about participating in the Virtual Business Intelligence Conference, I was more than a little skeptical. I pictured something like a ReadyTalk or Go To Meeting, with poor quality audio and video, and few options for interacting with the speakers and with other attendees. Instead, I found a platform that really seems to bring together the best of the physical and virtual formats, and I was delighted to get involved.

If this sounds interesting to you too, please stay tuned. I'll post more information over the weeks ahead, as we move closer to the conference dates from September 24 to September 26. So if you have any questions, please let me know, and please spread the word as well. There are a lot of people out there who can benefit from this content, and not everyone can make the trip to Orlando for TechEd or to Seattle for the Microsoft BI conference, so please help let them know that there is another option out there.

UPDATE 24 June, 2008:

It always weirds me out when I learn that people actually read my blog. I get enough information from Google Analytics to know that I get a decent number of hits each day, but that's not the same as getting an email from a real person about a post.

Like the email that I got today.

This morning I received an email from the account manager responsible for the conferencing solution that I use for work, wanting to know how she could work with me to resolve those problems. (How's that for great customer service? ;-) Reading back through this post I can see how it could be interpreted as "these conferencing systems are bad" as opposed to the "these conferencing systems are not ideal for the virtual conference format" message that I had intended. Here is the relevant part of my response to her, hopefully to eliminate the negativity that slipped into the post.

I didn’t mean to imply that I had any issues with ReadyTalk, LiveMeeting or GoTo Meeting for their stated purposes; simply that I don’t believe that they are ideal for the virtual conference format. The issues that I see are:

· Attendee join/leave – Most conferencing systems will announce each attendee by name and/or with a beep when the attendee joins or leaves the conference. For a conference call this functionality makes sense, but for a virtual conference this would be very disruptive.
· Audio Quality – The quality of the presenter audio in a conference call is dependent on the quality of the presenter’s phone line and the background noise in the presenter’s location, usually a home or office. Obviously this isn’t a problem with the conferencing system, but for a virtual conference it would negatively impact the attendee experience when compared to audio recorded with a professional microphone in a studio and then tweaked as needed before broadcast.
· Time-bound interaction – In a conference call, the only time that attendees can chat is during the session itself, and because the presenter is running the slides and/or demos, the attendees are limited to asking questions to (and getting answers from) helpers other than the presenter. Although the video may be recorded and made available for on-demand viewing, this is a unidirectional viewing; there are no options available for participants to interact. Again, this makes sense for a conference call, but for a virtual conference, it would be ideal to have the core presentation (slides, demos and presenter) and the attendee interaction/chat available independently, enabling the speaker to answer questions while the recorded session is being viewed.

I hope this helps clear up any confusion I may have caused!

[1] I know that quite a few strong BI speakers read this blog - if you're interested in being considered as a speaker, feel free to post a comment or send me an email over the next few days.

[2] Which was a huge success, which is why SSWUG is organizing another one and branching out to cover BI, SharePoint and .NET as well.

Splitting a Delimited Column in SSIS

A recent post on the MSDN SSIS Forums has reminded me that I've been meaning to post on this technique for quite a few months now. The basic problem looks like this: The input rows in a data flow contain a key field and a "list field" whose contents are made up of a delimited list of values. Like this:

KeyField    ListField
1           a,b,c
2           b,c
3           b,d,e
4           c,f,h
5           a,b
  

And the output needs to look like this:

KeyField   ListItem
1          a
1          b   
1          c
2          b       
2          c
3          b
3          d
3          e
4          c
4          f
4          h
5          a
5          b

The bad news is that there isn't any magical "Split Transformation" in SSIS. There are enough variations on data types and delimiters that it would probably be difficult to implement this as a transformation component. The good news is that it is trivial to implement this as a Script Component within the package. Here's how:

  1. Start by adding a data flow task to your package, and adding an OLE DB Source component (that returns the input data above) to the data flow. Then add a Script Component to the data flow as well.
  2. When prompted to select the type of component you want to create with the Script Component, select Transformation - it's the default, so this should be super easy.

    01-SelectType
  3. Next, connect the data flow path arrow from the Source component to the new Script transformation. Because the SSIS data flow is so heavily dependent on metadata, we can't really do anything meaningful with the Script Component until it knows what input data it will receive.

    02-HookUpPath
  4. Next, right-click on the Script Component and select Edit. In the Script Transformation Editor dialog box, select both columns to make them available as input columns so they are available in the Script code.

    03-Inputs
  5. Next comes the one "tricky" part: marking the transformation as asynchronous. Because the transformation will be producing more output rows than it receives input rows, this is a vital step that is sometimes missed by inexperienced SSIS developers. Mark the transformation as asynchronous by setting the SynchronousInputID property of the output to None.

    04-MakeAsynch
  6. Now, because we are building our own output buffer (one of the characteristics of an asynchronous data flow component) we need to add the columns that the output buffer will contain. Select the transformation's output, click Add Column and set the properties (mainly the name and data type) appropriately.

    05-BuildOutput
  7. Finally, what we've all been waiting for: the code. Click on the Design Script button and add the code to perform the actual split.

    06-DesignScript

  8. Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Dim keyField As Integer = Row.KeyField
        Dim itemList As String = Row.ListField
        Dim delimiter As String = ","

        If Not (String.IsNullOrEmpty(itemList)) Then

            Dim inputListArray() As String = _
                itemList.Split(New String() {delimiter}, _
                StringSplitOptions.RemoveEmptyEntries)

            For Each item As String In inputListArray
                With Output0Buffer
                    .AddRow()
                    .KeyField = keyField
                    .ListItem = item
                End With
            Next

        End If

    End Sub

  9. Once the code is in place, the last thing we need to do is to add a Row Sampling transformation (this one is handy because if it has no output data paths, it doesn't really do anything, and it doesn't require any configuration) and connect the output from the Script Component to it, with a Data Viewer attached. This will let us easily examine the output records from the script.

    07-FinalDataFlow
  10. Now, when you execute the package, you can see the split output data as desired.

    08-DataViewerOutput

Well, that ended up having more steps (and more pretty pictures) than I expected it to, but I stand by my initial assessment of this as being a trivial task, once you've seen how the Script Transformation works. And if this still seems complicated to you, make sure you check out The Rational Guide to Extending SSIS 2005 with Script by Donald Farmer. It's an excellent book that anyone who expects to spend time writing script in SSIS should read.

Thursday, June 19, 2008

New SSIS Project Templates in RC0

I've finally found the time to install SQL Server 2008 RC0 (although my SSIS/BIDS install is corrupted so I have not been able to run SSIS in RC0 through its paces) and noticed something interesting. I was creating a new C# project and noticed this:

NewProjectTemplates

Do you notice the two new project templates in the tree view on the left? That's right - Microsoft has added new class library (DLL) templates for creating custom Control Flow tasks and Data Flow components in C# and VB.NET. It's pretty unlikely that SSIS component development will ever approach the ease of use of the Script Task and Script Component, but having a Visual Studio project template that sets up much of the component plumbing is a nice step forward.

UPDATE: Upon further reflection (and a very helpful email) it turns out that these project templates are not designed for SSIS component developers. They are instead (as the project names should have clued me in, were I not so excited over my discovery) the templates that are used by Visual Studio Tools for Applications (VSTA) in the Script Task and Script Component inside a package. When you add a new Script Task or Script Component to your package, VSTA creates a new instance of one of these templates.

Wednesday, June 18, 2008

SSIS Excels - But Not at Excel

In the two months since I began using Google Analytics to track visitors to this blog, I've noticed a few things:

  • Two of the top three articles on the blog are related to working with Excel.
  • 26 of the top 100 searches that referred users to the blog had to do with Excel in one way or another.
  • SSIS doesn't deal with Excel data particularly well.

Ok, I'd actually noticed that last one long before I started using Google Analytics. And you've probably noticed it too.

I've long considered posting more about using SSIS and Excel, but I haven't, primarily because it's not something I deal with regularly in the real world, and because it's too annoying an exercise for me to go wading into it when I don't really have to.

And yes, I suppose that's another way of saying that I don't really love you enough to do the work.

But fortunately, Douglas Laudenschlager[1] does love you - he loves you more than you know. How do I know this? Because he is writing a three-part series of blog posts about how to work with Excel data in SSIS. Check this out:

  1. Connections and components
  2. Tables and data types
  3. Issues and alternatives

These articles present great information about how Excel manages data, what connectivity options are available and how to effectively use those options from within SSIS. I honestly don't know if there is anything in them that is not available elsewhere, but I can say with certainty that there is no other single location that covers all of the information that Douglas has gathered into these posts.

Please do yourself a favor and read Douglas' posts. If you deal with Excel data in your SSIS packages, you owe it to yourself. I know that the next time I need to wrestle with Excel in SSIS, I'm going to re-read all three posts before I spin up Visual Studio...

UPDATE: As of 21 June 2008, Douglas has posted the third and final installment in this series, and the link has been added above. I love the fact that he starts off with "The Root of All Evil." That's Excel in a nutshell...

 

[1] In case you haven't been paying attention, Douglas is a technical writer on the SSIS documentation team. He started blogging back in late April 2008, but has been consistently posting such excellent technical content that his blog has been on my "must read" list ever since.

It's Not Easy Being BI

I've been known to sometimes say that "being BI effectively doubles your chances of getting a consulting engagement on a Friday night." Yeah, it's kind of cheesy, but if you can't have fun with wordplays, what can you do, right?[1]

Well, the joke is sounding a little less funny today.

I'm currently involved in organizing a business intelligence conference[2] and have been sending emails to colleagues and potential speakers and have not been having the best of luck. In fact, when sending one email to an employee of a certain software company located in the Pacific Northwest, I got this explanation for why the email was rejected:

<Your e-mail was rejected by an anti-spam content filter on gateway (1.2.3.4). Reasons for rejection may be: obscene language, graphics, or spam-like characteristics. Removing these may let the e-mail through the filter.>

Yeah, no kidding.

I then went through the email and replaced all instances of "BI" with "Business Intelligence" and the email went through without error. But today I've been fighting with a few more similar problems - what a way to waste a few hours I didn't have to spare.

Has anyone else seen this issue?

 

[1] There are many other humorous stories that revolve around this particular play on words, like the well-known Microsoft speaker who wore his "BI Power Hour" t-shirt to the hotel bar at TechEd last year (not knowing that there was a gay pride event going on at the same time) and ended up having lots of nice men buy him drinks.

[2] More details on this in the days and weeks ahead - stay tuned!

Let the Blog Times Roll

Just as a quick note, if you read my blog at http://bi-polar23.blogspot.com/ and not through an RSS reader, you'll notice something new today. I've added a blog roll with links to those SSIS-related (or vaguely SSIS-related) blogs to which I subscribe myself, and the most recent post on each blog. If you're looking for more SSIS goodness, check them out.

And while I'm blogging about blogging[1] anyway, let me share a pet peeve with you. What is it with people who configure their blogs' RSS feeds to only include the first few lines of each post? This drives me crazy - I like being able to go into Outlook (my RSS reader of choice) and actually read the new posts. Instead, I have to go into Outlook, read what little text is actually there, and then click to open up the rest of the article in a web browser. What's up with that? I know it's not a big deal, but it's a little annoyance that takes much of the joy out of there being a new post on an interesting blog. You guys know who you are - are you doing this on purpose?

 

[1] Or metablogging, because I never metablog I didn't like...

Tuesday, June 17, 2008

Did you Miss PacMan?

At the Microsoft TechEd conference in Orlando earlier this month I mentioned my PacMan project on CodePlex to several different audiences, including during each of my breakout sessions. PacMan is my "SSIS Package Manager" utility that I've built in C# to make my own life easier, but enough people expressed interest in it that I shared the utility online with hopes that it would make their lives less painful as well.

Well, although I may have succeeded to some extent, the very nature of PacMan probably limits its analgesic potential. This is because PacMan is:

  • A "rough and dirty" development utility. This is code that I, as a developer, wrote for myself, as a developer, to use. This means that I was focused on solving short-term tactical goals, and not on building a general-purpose reusable framework for solving longer-term strategic goals. To put this another way, I expected to have to go in to PacMan and write a little code any time I wanted it to do something; I didn't have the time, energy or inclination to write all of that code up front.
  • Largely undocumented. Other than the source code itself, there is little help for developers who want to start using PacMan for their own purposes.

Sadly, my schedule is highly unlikely to allow me to address the first point at any time in the foreseeable future. PacMan will likely always remain a "rough and dirty" utility, because the same work that keeps it valuable to me also keeps me too busy to refactor and refine it into something better.

But the second bullet is something I am likely able to do something about sooner rather than later. In fact, I plan on doing a little something about it today. Right now. Right Here:

PacMan Overview:

The whole point of PacMan is to perform batch operations on groups of SSIS packages. That's it. Because SSIS does not provide any built-in features for working with multiple packages at one time, this was something that I felt was sorely needed. Specifically, I needed a way to add a new variable to close to 100 packages. Obviously manually updating the packages wasn't an option, so PacMan was born.

The PacMan utility is implemented in a Visual Studio solution with two projects: a Class Library (DLL) Components project and a Windows Forms UI project:

Figure 11

The Components project implements a small set of classes that encapsulate access to objects in the SSIS .NET object model. The PackageUtil and PackageCollectionUtil classes are the two most significant ones, as we'll see later on.

PacMan UI:

The PacMan UI is exceptionally simple.[1] At the top of the form there are four options for selecting the scope of operations for whatever work is going to be performed - a single package, a single Visual Studio project, a Visual Studio solution containing one or more projects, or a file system folder containing multiple subfolders and packages. At the bottom of the form there is a set of tabs; each tab contains data entry controls for initiating a specific operation that will be performed on the packages that are in scope.

Figure 10 

PacMan Components:

As mentioned above, the two main classes in the Components project are the PackageUtil and PackageCollectionUtil classes.

The PackageUtil class is essentially a thin wrapper around a Microsoft.SqlServer.Dts.Runtime.Package object. The PackageUtil class exposes a Package object through its SsisPackage property, and also exposes a set of properties and methods that make manipulating the package more straightforward.

The PackageCollectionUtil class is a List of PackageUtil objects, along with a set of properties and methods for manipulating the packages in the List.

When an operation scope is selected through the PacMan UI, an instance of the PackageCollectionUtil class is created and stored in the class-level packages variable which is in scope and available anywhere within the PacMan UI.

Most interesting scenarios in PacMan revolve around enumerating the packages collection and doing something[2] with each of the packages it contains.

Using PacMan:

The basic pattern of using PacMan goes something like this:

  1. Get the PacMan code from CodePlex and open it in Visual Studio.
  2. Add your own code to PacMan. You can do this either by adding a new tab to the UI or reusing an existing tab. There is an existing "Dev Workspace" tab that I use for experimenting or for one-off efforts when I don't want to be bothered building a UI for what I'm working on.
  3. Feel good about accomplishing so much with so little effort.

Ok, so it may not be quite that simple all the time, but that's all I can think of right now. The nice thing is that the packages collection takes care of most of the hard work - all you need to do is worry about working on a single package at a time and PacMan does the rest of the work.

PacMan Use Case Example 1:

A typical example of using PacMan may look like this. The code below is used to rename a connection manager in all selected packages:

private void buttonSample_Click(object sender, EventArgs e)
{
    string oldName = "LocalHost.AdventureWorksDW";
    string newName = "AWDW";

    foreach (PackageUtil p in packages)
    {
        if (p.SsisPackage.Connections.Contains(oldName))
        {
            p.SsisPackage.Connections[oldName].Name = newName;
        }
    }
    packages.Save();
}

It doesn't get much simpler than that, does it? Imagine trying to reliably rename objects across a project or solution that contained dozens or hundreds of packages.

PacMan Use Case Example 2:

Some operations may require more code that what was shown above. For example, I recently needed to review a set of several hundred packages to ensure that there were no duplicate package IDs. To do this I updated the PackageCollectionUtil class to add a GetPackageIDs method that returns a SortedDictionary collection of the package IDs and the names of the packages that use them. The code looks like this:

public SortedDictionary<string, List<string>> GetPackageIDs()
{
    SortedDictionary<string, List<string>> ids =
        new SortedDictionary<string, List<string>>();

    foreach (PackageUtil package in this)
    {
        string id = package.SsisPackage.ID;
        if (!ids.ContainsKey(id))
        {
            // New key - add a new dictionary
            ids.Add(id, new List<string>());
        }
        // either way, add the package path
        ids[id].Add(package.PackageFilePath);

    }
    return ids;
}

Then, in the UI code, I could simply call this method and update a TreeView to display the results to the user:

private void buttonEnumerateIDs_Click(object sender, EventArgs e)
{
    if (packages != null)
    {
        SortedDictionary<string, List<string>> ids =
            packages.GetPackageIDs();
        BuildPackageIdTreeView(ids);
    }
}

private void BuildPackageIdTreeView(SortedDictionary<string, List<string>> packageIDs)
{
    treeViewPackageIDs.Nodes.Clear();
    foreach (KeyValuePair<string, List<string>> id in packageIDs)
    {
        if (!checkBoxShowOnlyDuplicates.Checked || id.Value.Count > 1)
        {
            // add a node for the ID
            TreeNode newNode = treeViewPackageIDs.Nodes.Add(id.Key);
            // add a child node for each package
            foreach (string packagePath in id.Value)
            {
                newNode.Nodes.Add(packagePath);
            }
            newNode.Expand();
        }
    }
}

Summary:

As you can see, PacMan provides a framework for developers to more easily perform operations on groups of packages. Those developers will still need to write code, but the scope and complexity of the code should be significantly reduced.

Please expect to see a few more PacMan-focused posts in the days and weeks ahead. There are some features I showed off at TechEd that can probably use some additional explanation, so now that I've laid the framework for further discussion, I can start work on those posts.

In the meantime, if you have any questions, comments or suggestions on PacMan, please feel free to post them here or to the discussion forum on the PacMan site on CodePlex. I can't guarantee that I'll respond to each one in a timely manner, but I'll do my best. Enjoy!

 

[1] I was going to write "ugly" here, because I know what it looks like. A UI designer I'm not.

[2] Yes, I suppose this goes without saying, as doing nothing is not a particularly interesting scenario, but the "something" in question up to the developer.

Saturday, June 14, 2008

SSIS on 64-Bit Windows

I've just returned home from the TechEd 2008 conference in Orlando - what an amazing two weeks this was. Even though the conference ended only yesterday, it already seems vaguely unreal, as if it were too much fun to have been real.[1]

But while I was having fun at TechEd, members of the SSIS team were hard at work[2] writing about one of my favorite topics: SSIS deployment. And since 64-bit deployments was such a large portion of my second breakout session, it seems like synchronicity[4] that both Douglas Laudenschlager and Matt Masson both blogged on the topic as well in the last few days. Check here to see Douglas' excellent post about considerations when using SSIS on 64-bit machines, and here to see Matt's follow-up about how the SQL Server Integration Services job step type in SQL Server Agent (just introduced in SQL Server 2008 RCO!) now provides the option to use the 32-bit SSIS runtime. That's pretty cool. I doubt I'll be using this and giving up DTEXEC any time soon, but it's a good step forward for 64-bit deployments.

Side note: Expect another post or two about SSIS deployment in the next few days. I realized on the flight home that I neglected to mention a few important considerations during that breakout session, so once I've had the time to remind my family who I am, I'll fill in the gaps here. Stay tuned!

 

[1] Although I am pretty sure it really was real. Otherwise I'll need to come up with an alternate theory about where all these t-shirts came from.

[2] Now I know why I didn't see these guys at the show...

[3] BIN-450: SQL Server Integration Services Deployment Best Practices

[4] Or perhaps some other Police song

Wednesday, June 11, 2008

SSIS, SQL Server Agent and WMI

I need to hang out with Donald Farmer more often. He knows more about SSIS than I will ever forget - or something like that.

Earlier today I co-presented another interactive theater session (largely a repeat of the session we did last week during the developer conference, but with some new content and a different vibe) with Donald and (as before) I learned a few things about using SSIS that I had never seen before. One of them has to do with one of the most popular[1] SSIS topics: SSIS and SQL Server Agent.

This lesson doesn't actually have much to do with SSIS - it has to do with how you trigger the execution of a SQL Server Agent Job. Normally SQL Server Agent Jobs are executed on a schedule, so that the same packages run at the same time every day. To be honest, this is the only way I have ever used SQL Server Agent. But (as I learned today) you can also configure your SQL Server Agent Jobs to be triggered by Windows Management Instrumentation (WMI) events as well.

What does this mean?

It means that any events that are raised by the operating system (or other software that uses WMI) can be used to trigger the execution of your job and your packages. Think about having packages that automatically run when your processor utilization drops below a specific threshold as an example. The possibilities are endless[2] and the setup is documented in SQL Server Books Online here: http://technet.microsoft.com/en-us/library/ms366332.aspx

Enjoy!

 

[1] At least if the web traffic reports for my blog are to be believed.

[2] This is a nice way of saying that I don't know enough about WMI to come up with more examples without making a fool of myself.

Script Tasks as Execution Placeholders

I've blogged previously[1] about a technique that involves using the Script Task in SSIS as an "execution placeholder" - essentially just an "empty" task that doesn't do any work, but which gives you something from which to connect a precedence constraint with an expression, enabling dynamic execution logic downstream. All in all it works pretty well, since the Script Task in its default state returns Success and doesn't actually do anything. It's simple, and it works.

Or so I thought.

If you take a look at the comments for the post referenced above, you'll see that Bart Duncan asked if there was an advantage to using the Script Task as opposed to an empty Sequence container. My response was basically that either one should work fine. Well, this is the case for SSIS in either SQL Server 2005 or SQL Server 2008, but apparently this is not the case for upgrading packages from 2005 to 2008. I have one package in particular (it's a "master controller" package that is responsible for orchestrating the execution of lots of other packages, so it has a lot more control flow logic than I would usually put in one package) and when I recently did a test upgrade[2], I got a bunch of errors. In essence, they all say this:

Error at Script Task Name [Script Task Name]: No entry point was found during the migration of "ScriptTask_91682d2bfda94fdcb87f72f4c97cd852". Add an entry point to ensure that the script runs.

Apparently (please note that this is an educated guess, not something that I have researched and validated in depth) when you add a Script Task to the SSIS 2005 control flow design surface, the designer does not call out to Visual Studio for Applications (VSA) to precompile the code. This is OK in 2005, because the VSA runtime knows how to handle these "uninitialized" Script tasks, but when upgrading to SSIS 2008, which uses Visual Studio Tools for Applications (VSTA) instead of VSA, the package upgrade wizard and/or VSTA don't know how to handle this previously-valid scenario.

So I guess I owe Bart an apology, eh?

In any event, there are two simple workarounds that you can perform with your SQL Server 2005 SSIS packages that use this technique:

  1. Replace each "empty" Script Task with an empty Sequence Container. This will eliminate the problem altogether as it eliminates the use of Script tasks entirely.
    or
  2. Open each "empty" Script Task in the VSA editor and then close it back out again, clicking OK to save changes. This will eliminate the problem because it gets the Script tasks into a state that the upgrade wizard knows how to handle.

At first glance, option 1 probably looks more attractive, but it is a little more complicated than that. Remember: the whole reason we're having this discussion in the first place is because we needed a Control Flow component from which we could drag a precedence constraint with an expression. The problem that this introduces is that if we delete a Script Task in order to replace it with a Sequence Container, the precedence constraint will be deleted as well. Recreating the precedence constraint is simple enough, but then you would need to manually edit each affected constraint to recreate the expression as well. In my "master controller" package I had dozens of these constraints, so manually rebuilding everything didn't have much appeal - I went for option 2.

[1] Well, I guess I've been busy. When I looked up that previous post so I could link to it, I was surprised to discover that it was from May of 2007. I didn't realize that I'd reached my one year blogging anniversary already. How time flies when you're having fun...

[2] It's worth noting that this upgrade was performed using the February CTP of SQL Server 2008; I have yet to make the time to get RC0 installed, as I am somewhat tied up at TechEd and do not have a machine to sacrifice this week.

Tuesday, June 10, 2008

Master Data Management

I attended this morning an Interactive Theater session with Rajesh Patel (Program Manager) and John McAllister (Principal Program Manager) from the Master Data Management (MDM) team at Microsoft. I've been following MDM on and off for the last year or so, after I got to see Roger Wolter present on MDM at TechEd 2007, but there hasn't been a lot of news to follow. So when I saw this session on the schedule I knew I had to attend.

The session itself was great - John and Rajesh came in with a slide deck and an agenda, but when they learned (following a show of hands from the audience) that the people in attendance wanted something different, they basically threw away the slides and gave us exactly what we asked for, and it was lots of fun.

Unfortunately, there wasn't a lot of new information that I hadn't heard before (although it was great to hear it again from the people who are building the software), but there were a few gems that I picked up:

  • Microsoft MDM will ship as part of Microsoft Office SharePoint Server (MOSS) in the next release of Office - Office 14. It will be a shared service in MOSS and the admin and data steward UI will be presented through the MOSS portal.
  • The MDM team is definitely thinking about the synergy between MDM and BI. I didn't come away from the session with a really clear story in my mind about how (for example) an SSAS dimension would relate to an MDM hierarchy, but I attribute that vagueness to the poor acoustics rather than to a vague vision on the part of the presenters.
  • There is a technology preview version of MDM that is available today. You can email mdmvibe@microsoft.com and request access to the software. The bits you will get will be the same Stratature software that Microsoft acquired a year or so ago, but if you want to start working to see how MDM will fit into your BI solution or into your enterprise, it's a great opportunity to get a head start.

Now I wonder what other tidbits I'll pick up this week...

Auto Connect: Another SSIS Lesson Learned

As I mentioned in a recent post, I had the pleasure to present with Donald Farmer at the TechEd Developers conference last week. It was a lot of fun, and I learned a few things during the 75 minute session. One of them you've already seen, but here's another one - SSIS Designer Options in Visual Studio. This isn't quite as interesting as the ability to share a lookup cache, but it's still always nice to learn something new about a familiar tool. Here's the deal:

In Visual Studio, select Options from the Tools menu. In the Options Dialog, expand out the Business intelligence Designer node and then the Integration Services Designers tabs to get to the interesting stuff.

The General tab has a few options - the two that are most interesting for me are the "Show precedence constraint labels" option and the "Script Language" option:

01-General

The "Show precedence constraint labels" option will, when turned on, automatically display the constraint option (Success, Completion or Failure) on the control flow design surface for each constraint. I personally think that this is a good idea, but it would be a great idea if it displayed the constraint option and the constraint expression (as if the ShowAnnotation property were set to ConstraintOptions) since this is something that cannot be conveyed through the color of the constraint arrow.

But (oddly enough, given the setup at the top of this post) this isn't what I learned during the TechEd session last week. This is just something that I stumbled across while trying to get screen shots for the other two options below. The really cool stuff I learned is on the Control Flow Auto Connect and Data Flow Auto Connect tabs:

02-ControlFlow 

03-DataFlow

Basically what these options (click on the images to see more detail) give you is the ability to have your control flow tasks or data flow components automatically connected without your needing to drag the arrows to connect them. You can simply enable the auto-connect feature, specify the direction (right, left, above or below) in which the new component should be added (and for control flow, the type of precedence constraint to add) and then when you double-click on a toolbox item, it will be automatically added to your design surface and hooked up (based on the settings you specify here) to the component you have selected.

Pretty cool, eh?

And there's even better news! Based on the wonderful feedback we received following our session last week (well, I'm sure the fact that I asked nicely helped too) the repeat session at the TechEd IT Pro conference this week will feature both Donald Farmer and myself. I'll be sure to let him do all of the hard work like he did last week, but everyone had a great time last week and I'm sure this week will be even better. Please be sure to meet us at 2:45 tomorrow afternoon in Green Interactive Theater 2 in the "big room" at the Orange County Convention Center.

SSIS and EBCDIC

First and foremost, thanks to Douglas Laudenschlager for bringing this to my attention.

So with that out of the way, I'd like to point out that Amino Software have finally released their Lysine product, a custom SSIS component that enables the use of EBCDIC data in the SSIS data flow. Here's the blurb:

Lysine is an SSIS (SQL Server Integration Services) custom component that enables the inline conversion from EBCDIC (including redefines and packed decimal fields) into an ASCII byte stream to use within the pipeline in your SSIS package.

The major benefits of Lysine are two-fold:

  1. It saves you from having to write a pre-processor to unpack and convert the binary data.
  2. It prevents you from having to pre-process the data saving an extra, and many times costly, trip through the data.

Features

  • Conversion of EBCDIC Field Types
    • Comp3
    • Zoned
    • Redefines
    • Occurs
    • Occurs Depending On
  • Realtime preview of data as file definition is setup in an intuitive UI
  • Inline processing of data -- a single pass through the data.

Now normally I would not simply copy and paste a software vendor's marketing blurb into my blog - not without getting some sort of kickback anyway - but since so far as I know there is no other software on the market that solves this problem, and since this problem seems to come up pretty often, I thought that this was a good time to make an exception. It's not free (and for $1,499 some might argue that it's not cheap either) but if you need to convert EBCDIC data to ASCII data, odd are it will pay for itself pretty quickly. I haven't had the time to run it through its paces (and with a little luck I won't see any EBCDIC in the near future) but if you're struggling with this pain today you might want to check it out. Enjoy!

Luncheon, Redux

Even though TechEd has now moved on into its second week, and I am hard at work preparing for my two breakout sessions on Thursday and Friday (I've actually been "ready" for weeks, but when I have a big presentation coming up, I can't help but obsess about it) I wanted to share one more post about one of the many highlights from the first week of the conference. The folks at Microsoft who organized the delightful Influencers Roundtable Luncheon last week have sent out the group photos that were taken at the beginning of the event. Take a look:

gateslunch1_small

gateslunch2_small

And to sort of round out this whole event, S. Somasegar has also blogged about his impressions of the luncheon. It is fascinating to me that he came away with the same impression from the luncheon that I did:

"I left feeling somewhat overwhelmed by the passion that people in the room had for a single topic that took up the majority of the conversation.  The conversation centered around how we can better marry IT skills and expertise to non-profit work in the areas of healthcare and education.  All of the conversations and comments focused on how each of the people in that room could step up and bring their expertise in technology and IT to help people that are less fortunate"

I've blogged on this in the past, although not recently, but this luncheon and Soma's blog post have both reminded me how vital it is that each of us take advantage of his or her own unique strengths to help others. Whether its something as simple as helping out at your local school or community center, or as involved as volunteering for the NetHope project, there is something that each of us can do to improve the lives of those around us. It's so easy to make a big difference with a small effort. And it's delightful to know that I'm not the only one who feels this way...

Monday, June 9, 2008

SQL Server 2008 RC0 Available

The first release candidate build of SQL Server 2008 is now available for download on the MSDN and TechNet subscriber download sites. Unfortunately, this comes a little too late for me to rebuild my demo VMs for my breakout sessions here at TechEd later this week, but it's exciting news nonetheless. It's not surprising that a big interim release like this would take place during TechEd, but it is exciting even if it is not unexpected.

One of the interesting (and to me, unexpected) tidbits available with RC0 is information about new editions of SQL Server. Take a look here:

http://technet.microsoft.com/en-us/library/cc645993(SQL.100).aspx

In addition to the existing editions (Enterprise, Standard, Workgroup, Developer and Express) there are now also Web and Small Business editions coming in SQL Server 2008. The Small Business edition doesn't show up in the online version of BOL referenced above, but it does show up in the docs installed with RC0. I guess they're still finalizing things...

Saturday, June 7, 2008

Sharing The Lookup Cache

Sometimes I'm the last person to know. I like to think that I know at least a little about SQL Server Integration Services, but it still seems that I'm always discovering some technique or tidbit of knowledge that had previously escaped my notice. I learned one such lesson this week during the TechEd Developers conference in Orlando. I was co-presenting with Donald Farmer[1] during his "Microsoft SQL Server 2008 Integration Services: From D'oh to Wow" session Friday morning. Here's the deal:

in the SSIS data flow (both 2005 and 2008) the in-memory cache for a fully-cached Lookup Transformation can be shared between multiple Lookup Transformations, so long as they all use identical queries to populate their cache. This means that if two Lookups can be implemented with the same query (not a particularly common scenario, but it does happen from time to time) the performance of the package can be improved by only loading the lookup data into memory once.

There are opportunities here to refactor the lookup queries in existing data flows - consider a scenario where a data flow contains two Lookup Transformations, one to add a Department Name field to the data flow an done to and one to add a Department Contact field, both looking up based on the Department Key. A natural approach to solve this problem would be to have each Lookup use just the two columns it needs, thus making the lookup data as narrow as possible. (This is a best practice after all, right?) But by having both Lookups use the same three-column lookup query you can reduce the volume of data being read and cached, and potentially improve package performance. What's not to like?

Funny you should ask.

After the session was over, I went to visit my friend and colleague John Welch, who was staffing one of the SQL Server Business Intelligence booths at the conference. I explained to him what I'd just learned (I was all excited in being able to share this with someone who loves SSIS as much as I do) and asked him if he had every heard of such a feature. I'll paraphrase here, but his response was basically "uh, yeah." And John filled me in on the dark side of this cool little feature.

Take a look here: http://support.microsoft.com/kb/924016

That's right, there is a known bug in SSIS related to this behavior. Basically if there are multiple data flows in the same package, or in multiple packages that run in the same process, it is possible for the cache to be cleared before all of the Lookup transformations are done using it. There are some good workarounds (and some not-so-good ones too - I would probably not want to disable caching for my Lookups to work around this bug) included in the KB article above that can make this feature usable, but you need to be careful using it. And of course if you're using SQL Server 2008 SSIS, you can explicitly cache your data and share it in many more interesting and complex scenarios thanks to the new Cache Connection Manager options added to the Lookup transformation.

And of course this wasn't the only thing I learned yesterday, but it's the only one for which I have blogging time today. I've got the others filed away so once TechEd is done (maybe before, but as I have two breakout sessions for which to prepare this week I would not hope for too much) I'll have more SSIS goodies to blog.

 

[1] Donald was the Program Manager for SSIS during the SQL Server 2005 release, and is now the Principal Program Manager for SQL Server Data Mining. It was a great honor (and a ton of fun!) to share the spotlight with such an engaging, knowledgeable and funny speaker. And to make things even cooler, we're going to team up again for a repeat performance on Wednesday during the TechEd IT Professionals conference.

Wednesday, June 4, 2008

What A Day!

I had the most incredible day yesterday. It was the first day of the big TechEd Developers 2008 conference in Orlando, but that wasn't the thing that had me the most excited. I was (and still am) excited because of my lunch date. So what was I so excited about?

My lunch date was Bill Gates.

No, I'm not kidding. Yes, it was really that cool.

So how did this come about? As you probably know, Microsoft has a wide variety of communities with which they interact, such as the MCT and MVP programs. Members of these programs, in one way or another, work with users of Microsoft software around the world, and Microsoft values and appreciates the work that members of these communities perform. Well, to make a long story short, I was nominated as a community "influencer' by some kind folks at Microsoft, and in the end was selected as part of a small group (around 15 people) to meet for lunch with Bill Gates and S. Somasegar (who owns the Developer Division at Microsoft) for an "Influencer Roundtable Luncheon."

award[1]

The food was ok (I was hoping that there would be lots of foie gras and black truffles, but the chicken and veggies weren't too disappointing) but the conversation was great. Mr. Gates was totally relaxed and casual, and pretty funny too. For the most part the conversation revolved around ways that people (and companies) can make a difference in the world, which should be no surprise based on the makeup of the group.

And there were some very impressive people on the invitee list. One was Charles Hughes, a past president of the British Computer Society. He is working on a global organization to bring standards for professionalism and ethics to the IT world - something that is obviously very much needed. I had a chance to talk with him earlier in the morning, and he made a very strong impression. All of the things that he is doing are things that are dear to my heard, things that I do on a personal level and try to do locally, but he is working on a global scope, which is somewhat awe-inspiring. I don't know quite how I fit in with this august company, but I am delighted and honored to have been involved.

[1] Can you see the Manowar-shirt reflected in the photo above?

Sunday, June 1, 2008

Stopping the Breakout!

When people go to technical conferences, they tend to think along lines something like this:

"At 9:00 I'm going to go to this breakout session, and then at 10:30 I'm going to go to this other breakout session and then at 12:00 I'm going to go to that breakout session..."

Conference attendees seem to spend all of their time running from one room to another, just to see some "expert" talk about technologies that interest them. How crazy is that?

Wait a minute, you say! That's not crazy - that's the whole point of going to a technical conference, right?

Wrong!

The point of going to a technical conference is to have access to knowledge and resources that you can't get anywhere else. And if you're planning to attend Microsoft's flagship TechEd conference this week or next week (or both!) then the breakout sessions will all be included on your post-conference DVDs. That's right - Microsoft will mail you a set of DVDs with all of the sessions on them so you can watch them whenever you want to.

So what should you do while you're in Orlando? How about visiting the Technical Learning Center. The TLC is the place to be, if you ask me. This is where the experts will be hanging out, just waiting for people like you to come up and say hi and ask a question or two. These experts include Microsoft Certified Trainers, Microsoft Most Valuable Professionals, and quite a few members of the product groups that build the Microsoft tools we all love so much. And not only are there experts, but there are also whiteboards, tables and comfy couches and chairs, so you can relax in delightful comfort while you're basking in the inside knowledge that only the people who build the products can have.

Oh yeah, they give away the occasional t-shirt as well.

In any event, it's not too late to fine-tune your conference plans. It would probably be silly to not go to any breakout sessions at all (because you can always go up and talk to the speakers once the session is over) but make sure you take the time to visit the TLC. I promise you won't regret it.

Hot SQL Injection

Normally when we think of SQL Injection attacks, we think about web applications, because this is usually where they raise their ugly head. But although this is the standard vector for SQL injection, we SSIS developers need to understand that there is danger in our ETL applications as well.

One of the techniques many SSIS developers (and I count myself in this group) often use is basing a package variable on an expression, and having that expression build SQL statements based on the values of other variables in the package. The risk comes in, of course, because if these "input variables" contain malicious text, they can create the same type of vulnerability as if the text were entered through a web form. So please, make sure that you understand what vectors for input exist, and that they are as secured as possible. Certainly the nature of most SSIS applications reduces their attack surface, but any time you're constructing a SQL string dynamically, some vulnerability exists.

I mention this today because the Microsoft Security Vulnerability Research & Defense blog had a great post a few days ago going into SQL injection in great detail. There's nothing SSIS-specific in there, but it should still be considered required reading for SSIS package developers who use this technique. Just because our ETL applications don't generally have a GUI, that doesn't mean we don't need to be concerned about poisoned input data.

Now, I should go back and finish watching The Breakfast Club...