BIML Part III – Using BIML Script and Custom Meta Data Tables for Dynamic ETL Frameworks

Using BIML, BIML Script and existing metadata, such as INFORMATION_SCHEMA.TABLES, to dynamically generate SSIS packages can be very useful(See my previous post).  But what can be even more useful is to create your own metadata tables which can be used as the drivers for packages created via BIML Script.

Let’s say that you have a scenario where you know that you are going to have to continue making a large number of SSIS packages in the future that will perform similar, but slightly different functions.  Maybe they all use different source and destination tables with different queries, but also some of them have a transform or two while others don’t.  You know that in the future you’re going to have to make countless more of these packages for different feeds, but you don’t know the source queries or which exact transforms each one will need.

Let’s assume we currently know of three tables we need to populate, one of which needs a derived column.  In the future, the other two might also need this derived column, and we know that more tables will follow this same pattern.
We’ll start with 3 source tables and 3 destination tables with the “Dest” suffix.  The destination tables have a couple column differences from the source tables.  We also have a table called BIMLMetaData which will function as the driver for our package generation similar to how we used INFORMATION_SCHEMA.TABLES in the previous post’s example.
You can download the CREATE script here.  The script assumes an existing db called Sandbox, but change it as you like.  It will also insert a few records into the source tables and into our meta data table.

Custom Metadata Table
So now we should have 7 tables - Employee, EmployeeDest, Product, ProductDest, Store, StoreDest, and BIMLMetaData.  In BIMLMetaData, each row we will store the information specific to each package which will be combined with the common elements of all the packages that we will create in our BIML file.  In BIMLMetaData we have:
Three rows should have been inserted into BIMLMetaData from the script above, one for Product, Employee, and Store.  Note that they are all set to IsActive = 1 and Employee is set to NeedsTransform = 1.  The SQLSourceQuery values are just simple select statements which will be used in our packages.

First in the BIML code we will set up our connections and some variables within C#.  This is very similar to my previous post, except you will notice  that highlighted in blue is a query against our BIMLMetaData table instead of INFORMATION_SCHEMA tables as in our previous example.  Also, as in that example, we are going to begin a foreach loop over each row in the result set ‘MyDataTable’.  In the query we also add a WHERE clause for IsActive = 1, this would allow us to “turn off” BIML regenerating any packages we don’t it to.

<Biml xmlns="">
              <Connection Name="SandBox" ConnectionString="Provider=SQLNCLI11.1; Data Source=localhost;Initial Catalog=Sandbox;Persist Security Info=True;Auto Translate=False; Integrated Security=SSPI" />

              //Now we are in C#!
              string MyConnectionString ="Provider=SQLNCLI11.1; Data Source=localhost;Initial Catalog=Sandbox;Persist Security Info=True;Auto Translate=False; Integrated Security=SSPI";  
              string TargetTableName;
              string TableSchema;
              string SourceTableName;
              int NeedsTransform;
              string SourceSQLQuery;
              DataTable MyDataTable;

       //Populate Data Table with values from BIMLMetaData where IsActive =1                    
MyDataTable = ExternalDataAccess.GetDataTable(MyConnectionString,"SELECT m.SourceTableName,m.TargetTableName,m.NeedsTransform,m.SourceSQLQuery FROM dbo.BIMLMetaData m WHERE m.IsActive = 1");
                     /*Loop over each target table row in MyDataTable */
              foreach(DataRow row in MyDataTable.Rows){
                     TargetTableName = row["TargetTableName"].ToString();
                     TableSchema = "dbo";
                     SourceTableName = row["SourceTableName"].ToString();
                     NeedsTransform = Convert.ToInt32(row["NeedsTransform"]);
                     SourceSQLQuery = row["SourceSQLQuery"].ToString();

The following code snippets are contained within the above foreach loop.  For every record in the BIMLMetaData table we are going to create 1 package, named based on the source table name, count the existing rows in the target table (*Note: we don’t do anything with this count in this demo), and populate the OleDBSource in the Data Flow with the SourceSQLQuery from our meta data table.

<Package Name="Extract <#=SourceTableName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey"> <!--ConstraintMode connects the tasks in order-->
              <ExecuteSQL Name="Get initial count" ConnectionName="SandBox">
                           SELECT COUNT(1) AS Total FROM <#=TargetTableName#>
       <Dataflow Name ="<#=SourceTableName#> Data Flow">
                     <OleDbSource ConnectionName="SandBox" Name="<#=SourceTableName#> Source" >
                           <#=SourceSQLQuery#> <!--Use SourceSQLQuery column from BIMLMetaData to populate the OleDbSource direct input query-->

Next, still contained within the foreach loop, we add another nugget of C# code, this time with an if control statement.  Using the value in the BIMLMetaData.NeedsTransform column for each table, we can have BIML generate components or logic based on our settings in the meta data table.  

<# if (NeedsTransform ==1)
       <DerivedColumns Name="Transform <#=SourceTableName#> Data">
                     <Column Name="CreateDate" DataType="DateTime" > <!--Can do multiple derived columns.  Values are Expressions-->

Here I’m just adding a derived column with a getdate(); but you could use this technique for much more interesting things, such as which type of Slowly Changing Dimension logic to apply.  With this type of a setup, you can configure options and regenerate packages within minutes if a component is added or removed.
Lastly, we just add the OleDB Destination, as I have covered in previous posts, along with our directives at the bottom.
<OleDbDestination Name="Insert into Destination" ConnectionName="SandBox">
       <ExternalTableOutput Table="<#=TableSchema#>.<#=TargetTableName#>"></ExternalTableOutput>
              <#}#> <!--End for each MyDataTable loop-->

<#@ template language ="C#" tier ="2" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>

Once we have created our BIML file (full code sample available here), we right click and Generate SSIS Packages and should get one package per record in the BIMLMetaData table.

What you should notice is that of the three packages generated, only the ‘Extract Employee’ package has our derived column component since it was the only record in BIMLMetaData to have NeedsTransform = 1.

In this demo we took a look at how you can create your own meta data tables to use in conjunction with BIML to build flexible and configurable SSIS frameworks.  Here we only used a few flags to generate packages with or without a derived column task, but the concept could easily be applied to much more complex logic – such as SCD Type of dimensions or what behavior updated Fact table data should use.  The thing to keep in mind is that BIML should just be “one tool in your toolbox”.  You want to be careful not to build a very flexible, yet overly complex, set of BIML files when it would be simpler to create a package the old fashioned way.

Sample Files
Table CREATE script – CREATE_MetaData_Example_Tables.sql
Finished BIML file – BIMLScript_MetaDataTable.biml


Labels: , , ,