CLOBs and BLOBs of Oracle Love

So how does one import a CLOB (character large object) from Oracle into SQL using SSIS (2008)? 

If you have ever tried importing data from Oracle via SSIS, you'll know starting is the hardest part.  If you are looking the best way to get started, please see this post for the funniest (but, sadly, truest) saga of getting connected to Oracle:

The Attunity connectors in the post above support CLOB data, but only in SQL 2012.  On a recent project, I needed to import some XML data which was stored as a CLOB into SQL 2008 R2.  So I busted out the ADO.Net adapter and let the good times roll.... and roll... and after several hours, imported some sample data into my local system.

First, I should note that NTEXT is deprecated and so Microsoft recommends using varbinary(max) or filestream (for larger files) to store BLOB data.  Additionally, I might have directly imported the data into an XML column, but it was not always well-formed XML and so would need to be staged for cleansing.

The ADO.Net connector for SSIS will import the column as NTEXT and this will implicitly convert the column directly to varbinary(max) in an oledb destination (yeah, I know that's deprecated too.  But it's so much better than the ADO.Net adapter.).  However, I had a catch - the XML was encoded in UTF-8 (with a declaration that said as much in the header.)  SQL hates UTF-8 -- won't even let it in the house.  It needs to be 16, no less, just like it's scotch(?).  I don't know, it just needs to be UTF-16.  So I went through some hoops to try to change the text in the declaration, before putting it into SQL, so that it would be easily convertible to XML.  I tried converting it to a string and using the replace() method to change the text and few other similar approaches, but nothing seemed to work.  I settled on using System.Xml.Linq to convert the Blob to an XDocument object as shown below:

   using System.Xml.Linq;

        //get the length of the blob for exporting to string
        Int32 blobLen = Convert.ToInt32( Row.blobColumn.Length);
        string mydata = Encoding.Unicode.GetString(Row.blobColumn.GetBlobData(0, blobLen));

        //parse to xml
        XDocument xml = XDocument.Parse(mydata);
        //change declaration
        xml.Declaration.Encoding = "utf-16";

        //shove it back into a BLOB
        System.Text.ASCIIEncoding  encoding = new System.Text.ASCIIEncoding();
        Byte[] bytes = encoding.GetBytes(xml.ToString());
        //blow away the blob and put the new version back in

The final solution involved a try...catch block for dealing with the malformed XML.  However, thought I would share this script if anyone finds themselves in the same boat.  

Mark Wojciechowicz

Labels: , , , , ,