How to hash a row in an SSIS Script Component

Have you ever had the need to create a row hash in an SSIS Data flow?  No?  Why would I ever need to do that?  What's a row hash, anyway?  Well, Bob Blackburn gives a nice demo that answers all these questions here.

So this post is to add on to that topic.  Suppose that you have some transformations or lookups that were performed in the data flow and you want to hash the end result set of columns before you determine what has changed.  So the steps would be:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Security.Cryptography;
using System.Text;

public class ScriptMain : UserComponent

    public override void Input0_ProcessInputRow(Input0Buffer Row)
        StringBuilder Concat_Columns = new StringBuilder();

        string RowValues = Concat_Columns.ToString();

        HashAlgorithm hash = SHA1.Create();
        byte[] rowByte = hash.ComputeHash(Encoding.ASCII.GetBytes(RowValues));
        Row.Row_Hash = "0x" + BitConverter.ToString(rowByte).Replace("-", "");



Note that the columns might need to be checked for nulls becaus the Append method would fail otherwise.  Also, no string datatypes will need to be converted to string.

And now a brief explanation on what this is doing.  We use a StringBuilder to concatonate all the values for a row (this is why System.Text is needed in the references.  This is then converted to a string.  We use the SHA1 hash alogrithm to convert that string to a byte array.  we then convert that byte array to a string removing the dashes and prefixing the string with "0x."  The purpose of that string manipulation is so that we can match to a TSQL version of the SHA1 hash if we need to.

I think this post was more a safekeeper for myself because I always forget this kooky syntax for computing the hash.  But maybe someone else will find this helpful too.

Mark Wojciechowicz  

Labels: , ,