Have you ever had the need to create a row hash in an SSIS Data flow? No? Why would I ever need to do that? What's a row hash, anyway? Well, Bob Blackburn gives a nice demo that answers all these questions here.
So this post is to add on to that topic. Suppose that you have some transformations or lookups that were performed in the data flow and you want to hash the end result set of columns before you determine what has changed. So the steps would be:
- Add a script component to the data flow
- Add a column to the output called Row_Hash
- Add the following into the script component:
public class ScriptMain : UserComponent
StringBuilder Concat_Columns = new
string RowValues = Concat_Columns.ToString();
HashAlgorithm hash = SHA1.Create();
byte rowByte = hash.ComputeHash(Encoding.ASCII.GetBytes(RowValues));
Row.Row_Hash = "0x" + BitConverter.ToString(rowByte).Replace("-", "");
Note that the columns might need to be checked for nulls becaus the Append method would fail otherwise. Also, no string datatypes will need to be converted to string.
And now a brief explanation on what this is doing. We use a StringBuilder to concatonate all the values for a row (this is why System.Text is needed in the references. This is then converted to a string. We use the SHA1 hash alogrithm to convert that string to a byte array. we then convert that byte array to a string removing the dashes and prefixing the string with "0x." The purpose of that string manipulation is so that we can match to a TSQL version of the SHA1 hash if we need to.
I think this post was more a safekeeper for myself because I always forget this kooky syntax for computing the hash. But maybe someone else will find this helpful too.
Labels: Hash, Script Component, SSIS