Parsing JSON in SSIS

2015-05-07

If you're not too familiar with JSON, it is a lightweight data exchange format.  It's what XML wanted to be until too many cooks made it fat and sluggish.  The fact that it does not use markup, makes it a more ideal format for heavy data integration in which hierarchical relationships exist.  In this example, we'll look at person / address records.  Each person can have multiple addresses - the file will contain one record per row and it will look like this (formatted a little for readability):

{
Id : 1,
Name : "Mark Wojciechowicz",
addresses : [{Description : "Home", street : "1 Happy Place",  city: "Motown", state: "NJ"}, {Description : "Work", street : "2 Monkey Lane",  city: "Zoo", state: "PA"}]
}
{
Id : 2,
Name : "Somebody Else",
addresses : [{Description : "Home", street : "4 No Drive",  city: "Hideaway", state: "NY"}]
}

So there is an Id and name for each record and note that straight brackets indicate an array, in which one person can have one or more addresses.  The actual file will have line feeds only at the end of each record.

As of SQL 2014, SSIS has no support for this type of format, which means we will need to parse this with a script task.  Worse, the .NET framework also has no native support for parsing JSON, so we will be referencing an assembly called JSON.Net.  To get this set up we will need to download the source and do the following:

  1. sign the assembly
  2. try to build the project and double click on the errors
  3. Fix the references to other assemblies (comment these out)
  4. and, finally, GAC it (example in PS using gacutil):

$file = "<wherever this solution is saved\Newtonsoft.Json\bin\Debug\Net40\Newtonsoft.Json.dll"
$gactutil = "C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\gacutil.exe"

& $gactutil /i $file 

Now that all that is out of the way.  Time to get to the script.

  1. In the package, add a file connection manager to the JSON file.
  2. Add a script component source to a data flow
  3. In the connection manager section, reference the connection with a sensible friendly name
  4. In the inputs and outputs, create an output for person and one for address
  5. For the script:
    1. create a person object and an address object (these are defined in the classes at the bottom).  The person object will have an array of addresses.  
    2. We get the path to the file from the connection manager and then read each line.  
    3. The contents of each line are parsed directly into our person object using the JSON parser.  
    4. Finally, we add rows to each buffer and pass the contents of the object to the output columns.  



using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.IO;
using Newtonsoft.Json;


[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    public override void CreateNewOutputRows()
    {
        string filePath = Connections.PeopleFile.AcquireConnection(null).ToString();
        using (StreamReader fileContents = new StreamReader(filePath))
        {
            while (fileContents.Peek() >= 0)
            {
                string record = fileContents.ReadLine();
                Person person = JsonConvert.DeserializeObject<Person>(record);
                PersonOutputBuffer.AddRow();
                PersonOutputBuffer.Id = person.Id;
                PersonOutputBuffer.Name = person.name;

                foreach (Address address in person.addresses)
                {
                    AddressOutputBuffer.AddRow();
                    AddressOutputBuffer.city = address.city;
                    AddressOutputBuffer.Description = address.Description;
                    AddressOutputBuffer.street = address.street;
                    AddressOutputBuffer.state = address.state;
                    AddressOutputBuffer.Person_Id = person.Id;
                }
            }
        }
    }


    public class Person
    {
        public int Id { get; set; }
        public string name { get; set; }
        public Address[] addresses { get; set; }
    }

    public class Address
    {
        public string Description { get; set; }
        public string street { get; set; }
        public string city { get; set; }
        public string state { get; set; }

    }


}

Here is the package in action:


Mark Wojciechowicz

Labels: ,