Friday, June 20, 2008

Splitting a Delimited Column in SSIS

A recent post on the MSDN SSIS Forums has reminded me that I've been meaning to post on this technique for quite a few months now. The basic problem looks like this: The input rows in a data flow contain a key field and a "list field" whose contents are made up of a delimited list of values. Like this:

KeyField    ListField
1           a,b,c
2           b,c
3           b,d,e
4           c,f,h
5           a,b
  

And the output needs to look like this:

KeyField   ListItem
1          a
1          b   
1          c
2          b       
2          c
3          b
3          d
3          e
4          c
4          f
4          h
5          a
5          b

The bad news is that there isn't any magical "Split Transformation" in SSIS. There are enough variations on data types and delimiters that it would probably be difficult to implement this as a transformation component. The good news is that it is trivial to implement this as a Script Component within the package. Here's how:

  1. Start by adding a data flow task to your package, and adding an OLE DB Source component (that returns the input data above) to the data flow. Then add a Script Component to the data flow as well.
  2. When prompted to select the type of component you want to create with the Script Component, select Transformation - it's the default, so this should be super easy.

    01-SelectType
  3. Next, connect the data flow path arrow from the Source component to the new Script transformation. Because the SSIS data flow is so heavily dependent on metadata, we can't really do anything meaningful with the Script Component until it knows what input data it will receive.

    02-HookUpPath
  4. Next, right-click on the Script Component and select Edit. In the Script Transformation Editor dialog box, select both columns to make them available as input columns so they are available in the Script code.

    03-Inputs
  5. Next comes the one "tricky" part: marking the transformation as asynchronous. Because the transformation will be producing more output rows than it receives input rows, this is a vital step that is sometimes missed by inexperienced SSIS developers. Mark the transformation as asynchronous by setting the SynchronousInputID property of the output to None.

    04-MakeAsynch
  6. Now, because we are building our own output buffer (one of the characteristics of an asynchronous data flow component) we need to add the columns that the output buffer will contain. Select the transformation's output, click Add Column and set the properties (mainly the name and data type) appropriately.

    05-BuildOutput
  7. Finally, what we've all been waiting for: the code. Click on the Design Script button and add the code to perform the actual split.

    06-DesignScript

  8. Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Dim keyField As Integer = Row.KeyField
        Dim itemList As String = Row.ListField
        Dim delimiter As String = ","

        If Not (String.IsNullOrEmpty(itemList)) Then

            Dim inputListArray() As String = _
                itemList.Split(New String() {delimiter}, _
                StringSplitOptions.RemoveEmptyEntries)

            For Each item As String In inputListArray
                With Output0Buffer
                    .AddRow()
                    .KeyField = keyField
                    .ListItem = item
                End With
            Next

        End If

    End Sub

  9. Once the code is in place, the last thing we need to do is to add a Row Sampling transformation (this one is handy because if it has no output data paths, it doesn't really do anything, and it doesn't require any configuration) and connect the output from the Script Component to it, with a Data Viewer attached. This will let us easily examine the output records from the script.

    07-FinalDataFlow
  10. Now, when you execute the package, you can see the split output data as desired.

    08-DataViewerOutput

Well, that ended up having more steps (and more pretty pictures) than I expected it to, but I stand by my initial assessment of this as being a trivial task, once you've seen how the Script Transformation works. And if this still seems complicated to you, make sure you check out The Rational Guide to Extending SSIS 2005 with Script by Donald Farmer. It's an excellent book that anyone who expects to spend time writing script in SSIS should read.

7 comments:

Rahul Kumar said...

Cool stuff

Andy..!!!! said...

Too good...!!!

Unknown said...

Nice. I needed this exact thing. I was not sure how to do it, but had an ideas this was how it worked.

Unknown said...

Can two delimited columns be split into rows using one script tranformation?
Eg:
KeyField ListField1 ListField2
1 a,b,c A,B,C
2 b,c B,C
3 b,d,e B,D,E
Transformed to:
KeyField ListField1 ListField2
1 a A
1 b B
1 c C
2 b B
2 c C
...etc

Matthew Roche said...

@Shrek: Yes, you can do this with a single script transformation. You'll just need to update the code to parse the row contents differently. I would start by having two string array variables (one for each delimited column) and then instead of using a For Each loop over a single array, use a For loop and address the elements in each array by index number. But the short answer is yes - this should be a relatively simple problem to solve, and the solution approach should be the same.

Epikto said...

Still Good! Thank you :)

Administrateur said...

So fantastic, but i'm on SQL 2008 Bids and the script don't work.

Maybe due to VSTA and C# ?

I can't switch to VB in my scipt component editor .

Regards