Saturday, May 31, 2008

A Tip For TechEd

If you're like me, you're preparing for a trip to TechEd, Microsoft's biggest technical conference, to spend a week or two at the Orange County Convention Center in Orlando, Florida. If this is the case, then I have an important tip for you:

Buy new insoles for your shoes.

That wasn't what you were expecting, is it? It's certainly not technical advice, but it will help you get the most out of the conference. The OCCC is huge. Massive. The main room (where the Technical Learning Center will be located) is the biggest room in the world. Seriously big. And that means that you'll be doing a lot of walking, and a lot of standing, and (trust me on this one) there's not a lot of padding between the carpet and the cement floor underneath.

So do yourself - and your feet - a big favor: Buy new insoles before the conference begins. Your feet will thank you. Mine are thanking me already.

Checking SSIS Data Flow Metadata

The SSIS Data Flow is heavily dependent on metadata. In order for it to do anything meaningful, it needs to have access to the metadata for all of the sources and destinations with which it interacts - the column names and data types being the most important thing. And because of the strongly-typed nature of the memory buffers that the SSIS Data Flow uses to deliver its incredible runtime performance, it tends to be very, very picky about its metadata.

On the SSIS Forums on MSDN there are often posts with questions or problems related to Data Flow metadata. Some of them are related to data type mismatches - why can't I load an ASCII string into a Unicode string column, or join an ASCII string to a Unicode string in a Lookup transformation? Some of them are related to perceived data type changes - I'm loading in a CHAR field, but the data is being treated as NUMERIC!

In the end, it all comes down to the metadata. But how does an SSIS developer see that metadata? The answer is very easy, but not always obvious. Many people will suggest using a Data Viewer - but this only shows the data, and not the metadata that describes it, and it's not always obvious what is the underlying type of a field from the way that it is displayed. Some data flow transformations will show the metadata of their inputs and outputs, but usually this information is hidden deep in the Advanced Editor and can only be viewed one column at a time, which is something of a pain.

Fortunately, there is a very simple technique that works every time: Edit the data flow path arrows.

Consider this simple data flow:

metadata01

The OLE DB Source component is configured to use this query:

metadata02

But what is the metadata coming out of the source component? If you right-click on the green data flow path arrow and choose Edit[1], on the Metadata tab of the Data Flow Path Editor dialog box you can see the metadata for all columns in that data flow path in one place:

metadata03

Note that in addition to the "obvious" metadata such as the data types and sizes, you can also view information about the sort order for each column and the data flow component that added that column to the data flow.

Now, for a trivial data flow like this one there isn't a lot of use to digging into the metadata like this. But picture a real-world data flow where you have multiple data sources, or where you're using the Derived Column or Data Conversion transformations to change the data types of columns. In that sort of scenario, it's easy to "lose track" of what's going on where. And when you are running into problems where you're saying "I know it shouldn't be doing this - it doesn't make sense! Why is SSIS giving me this error?" then the best thing to do is to look at the metadata that SSIS is using to make its decisions.

This simple technique has saved me many, many hours on SSIS projects over the years. I can't believe it's taken me this long to blog about it...

[1] You can also double-click on the arrow, but if you're like me and get shaky hands after a pot or two of coffee, right-clicking may be more effective.

Thursday, May 29, 2008

Deploying and Testing Custom SSIS Components

SSIS provides two primary mechanisms for adding custom code to SSIS packages:

  1. You can use the Script Task (for control flow) or the Script Component (for data flow) to add .NET code directly to your package.
  2. You can develop your own .NET assemblies that use the SSIS .NET API to interact with the SSIS runtime to create custom tasks, transformations and more.

From my experience, the vast majority of SSIS developers tend to go with option #1. The Script Task and Script Component are relatively simple to implement, and you can take advantage of the .NET Framework base classes from your VB.NET code (or C# code starting with SQL Server 2008) without needing to worry about building and deploying a separate DLL to accompany your packages. Since the code is contained within the package itself, all you need to deploy is the package.

But therein lies the biggest drawback of using the Script Task and Script Component as well: the code is contained within the package itself. This means that if you have the same logic that needs to exist in multiple packages, then you need to copy it into each of them, which can quickly make for a maintenance nightmare if that shared logic needs to be updated.

And this is where custom components come in. Because they are defined outside the package their logic can be defined (and updated) in a single shared location and used in as many packages as necessary. Even though building a custom SSIS component is "real .NET development" it is relatively straightforward if you have .NET development experience. The SSIS object model defines a set of base classes and interfaces, and there samples included with SQL Server so it's easy to cheat during development.

But deployment and testing is often a different story. There are a few minor "gotchas" waiting in store for SSIS component developers:

  • In order for SSIS to be able to use a custom assembly it needs to be added to the Global Assembly Cache - the GAC.
  • In order for package developers to be able to use a custom assembly in Visual Studio, the assembly needs to be copied to a "special folder" in the SQL Server install directory tree.
  • Because the assembly is a DLL, it cannot be run on its own for testing; it needs to be run in the context of another process, and because the assembly is an SSIS component that process really needs to be a running SSIS package.

Now if you're a developer who has experience building other types of .NET applications but have not build components to plug into other products (like BizTalk, Commerce Server - you name it) this may seem daunting. But fortunately this isn't the case - it's really as simple as pie.

First - the GAC. There is a command line utility called GACUTIL.EXE that is included with the Visual Studio SDK and the Windows SDK. On a computer with a standard installation of these two tools, you will find it in these two folders:

  • C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin
  • C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin

Warning: Be certain not to use the version of GACUTIL that you may find in the C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322 folder, as this is for the .NET 1.1 version of GACUTIL and SSIS requires .NET 2.0.

Second - the "special folder." The actual folder will vary based on the type of component you're building, but they're all (on a computer with a standard install of SQL Server) subfolders of C:\Program Files\Microsoft SQL Server\90\DTS:

  • \Connections
  • \ForEachEnumerators
  • \LogProviders
  • \PipelineComponents
  • \Tasks

As you can see, there's a folder for each type of component that you can develop.

Now that we know where the assembly needs to go, how do we get it there without needing to manually copy it each time you build the project? The answer lies in the project properties, on the Build Events tab:

buildevents

In the Post-build event command line text box, enter the following text:

gacutil.exe  /if "$(TargetPath)"
copy "$(TargetPath)" "$(ProgramFiles)\Microsoft SQL Server\90\DTS\Tasks\$(TargetFileName)" /y

Please note that this example applies to a custom task - if you're building a different type of component, you'll need to put in a different folder path. Also please note that the example above does not include the fully qualified path for GACUTIL.EXE. This is because on my development machines I have added the C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin folder to the PATH environment variable so I don't need to type in the full path for the many helpful utilities in this folder. If you have not added this folder to your PATH, you will need to fully qualify this file.\

With those disclaimers out of the way, note what this Post-build event does: every time the project is successfully compiled, the project output (the custom task assembly in this case) will automatically be added to the GAC and be copied into the "special folder" where the SSIS designers in Visual Studio need it to be. We're good, right?

Right?

Well, there is still the question of testing and debugging. I'm sure that most developers always write perfect code the first time, but I always seem to need to iterate a few times to get my code just right. And in order to see the runtime behavior of an SSIS component is to add it to a package, and to run that package. Well, that's straightforward enough, but how to do that and still be able to set breakpoints in the component code so we can find those pesky bugs? Fortunately, the answer here is just as simple as before, and the place you enable debugging it in the project properties again - this time on the Debug tab:

debug

There are two things you need to do here:

  • Check the "Start external program" and enter the fully qualified path to DTEXEC.EXE in the text box
  • Enter the appropriate arguments in the Command line arguments text box, to reference a package that uses the component you want to debug.

Then, it's as simple as debugging any .NET code: set breakpoints where you want to pause execution, and hit F5. Visual Studio will launch DTEXEC.EXE and attach to the new DTEXEC process. DTEXEC will load the package, which will in turn load and call into your custom component. Your breakpoints will be hit and you'll be able to step through your code to your heart's content.

See - as simple as pie.

Update -  01 June 2008: If you're developing your custom component on a 64-bit Windows computer, remember that the folder to which you need to copy your assembly is in the C:\Program Files (x86)\Microsoft SQL Server\90\DTS\ folder, because Visual Studio is a 32-bit application.

Checking the Value of a Variable in SSIS

It's a pretty common debugging and diagnostic task to need to check the value of a variable. The SSIS designers in Visual Studio do provide a set of debugging tools, but sometimes it's quicker and easier to get "low tech" and get the variable value yourself.

Back in the bad old days of VB 6 we would do something like this:

MsgBox g_VariableName

Back in the worse old days of ASP we would do something like this:

Response.Write(strVariableName)

And today in SSIS we can do something like this:

    Public Sub Main()

        Dim variableName As String = "User::VarTest"

        Dim vars As Variables = Nothing

        Dts.VariableDispenser.LockForRead(variableName)

        Dts.VariableDispenser.GetVariables(vars)

        Dim variableValue As String = CType(vars(variableName).Value, String)

        vars.Unlock()

        System.Windows.Forms.MessageBox.Show( _

            String.Format("Variable Value is '{0}'", variableValue), _

            String.Format("Value for Variable '{0}'", variableName))

        Dts.TaskResult = Dts.Results.Success

    End Sub

And (assuming we have a variable named VarTest in our package) when we run the package we'll get something like this:

varvalue

Pretty simple, and it gets the job done. But there has to be a better way, right? No one wants an annoying modal dialog box popping up during package execution, even when they're developing and debugging. So you may want to do something like this instead:

    Public Sub Main()

        Dim variableName As String = "User::VarTest"

        Dim vars As Variables = Nothing

        Dts.VariableDispenser.LockForRead(variableName)
        Dts.VariableDispenser.GetVariables(vars)

        Dim variableValue As String = CType(vars(variableName).Value, String)

        vars.Unlock()

        Dts.Events.FireInformation(-1, "Test Script", String.Format( _
            "The value for variable {0} is '{1}'", variableName, variableValue), _
            String.Empty, -1, False)

        Dts.TaskResult = Dts.Results.Success

    End Sub

Now, when you run the package you will get something like this:

varvalue2

The output will be written to the Output window in Visual Studio and to any other log providers that you have configured to listen to information messages from the package.

Enjoy!

The Danger of Thinking Grandly

There's a great article today on The Register - my favorite UK IT news web site. I love The Register because they cover so many facets of the IT world that I don't touch regularly myself[1], have a strong anti-Microsoft bias[2], and generally have a dry British sense of humor[3]. All in all, it's a good thing.

Today's article is on "Frameworks and the danger of a grand design" and should be considered required reading for anyone who writes code or builds applications. It's not rocket science, but it sums up quite nicely a bunch of lessons that I have learned through my own pain over the years. If you have also felt that pain yourself, you'll read the article and nod and smile. And if not, you may not quite get it, but next month or next year when you do feel that pain, you'll look back and say A-Ha!

So check out the article today: http://www.regdeveloper.co.uk/2008/05/29/grand_design_syndrome/

[1] Which keeps me informed on topics that I would otherwise know nothing about.

[2] Which keeps me honest, since I'm so firmly entrenched in the Microsoft world myself.

[3] Which keeps me amused.

Wednesday, May 28, 2008

SSIS on RunAs Radio

My interview on the online radio show "RunAs Radio" is now online - check it out here: http://www.runasradio.com/

This is a short (36 minutes, including the opening and closing bits) show where Richard Campbell, Greg Hughes and I talk about SSIS for the IT Professional. Because many of the enhancements to SSIS tend to be developer-focused, I know many IT Pros who shy away from SSIS and cling irrationally[1] to the "good old days" of DTS. But I believe that with a little information about how SSIS works and how to use it, IT Pros everywhere will happily leave DTS behind and embrace the power and beauty of SSIS.

Well, perhaps I may be getting carried away, but the show is a lot of fun nonetheless. We covered topics ranging from architecture to development to deployment and had a pretty good time doing it. So be sure to download the show and check it out.

[1] Ok, so it may seem rational to them, but my memories of DTS are still fresh and painful in my mind. ;-)

Thursday, May 22, 2008

Another Reason to Avoid Data Sources in SSIS

I've often railed on the evils of Data Sources - the .ds files that you can add to your SSIS projects in Visual Studio. I tell people to avoid them whenever I'm talking about SSIS deployment, and I even gave them a mention in my SSIS Best Practices blog post last year:

"Don’t use Data Sources: No, I don't mean data source components. I mean the .ds files that you can add to your SSIS projects in Visual Studio in the "Data Sources" node that is there in every SSIS project you create. Remember that Data Sources are not a feature of SSIS - they are a feature of Visual Studio, and this is a significant difference. Instead, use package configurations to store the connection string for the connection managers in your packages. This will be the best road forward for a smooth deployment story, whereas using Data Sources is a dead-end road. To nowhere."

And now there is even one more reason (as if there weren't enough already) to avoid them: they're going to complicate your package upgrade path from SQL Server 2005 to SQL Server 2008. This is due in part to the fact that the SQL Native Client provider name has changed from SQLNCLI to SQLNCLI10 in SQL Server 2008. The SSIS package upgrade wizard will take care of this in your packages and update your connection managers appropriately. But if you're using Data Sources in your project, they will not be changed, so when you next open your properly upgraded packages the new (and correct) connection strings will be overwritten with the old (and incorrect) values and the packages will fail.

And that's not what we want, is it?

So save yourself some pain and go with Package Configurations[1] instead, ok?

 

[1] In the context of full disclosure, I honestly don't know if the upgrade wizard is intelligent enough to find and fix connection strings in package configurations either, but they're still the better choice. ;-)

Thursday, May 8, 2008

RunAs Radio - In The Can

Just a few hours ago I finished recording an episode of RunAs Radio, talking with Richard Campbell and Greg Hughes about SQL Server Integration Services. It was a ton of fun, and we covered a lot of ground for a 30-minute session. Even though there really wasn't an agenda per se, these were the basic topics we covered:

  • Why SSIS rocks, and why most IT Pros won’t appreciate it very much.
  • How SSIS works, and where to go to get started with SSIS.
  • SSIS Deployment – how to make (some of) the pain go away.
  • My two SSIS breakout sessions at TechEd IT Pro in Orlando next month.

This episode should be ready for your listening pleasure in just under three weeks, on May 28th. If you're a DBA and would like to know more about SSIS (or if you would just like to hear the soothing sound of my voice) you should definitely check it out. I'll be sure to post another update here when the show goes live.

After the show was recorded, we also talked for a little while about the possibility of getting back together to record an episode of .NET Rocks! or dnrTV! to go into more depth on the development side of SSIS. It's too early to say if we'll be able to record a follow-up show, but I had a great time today and would love to do it again - what fun!