Using BIML, BIML Script and existing metadata, such as
INFORMATION_SCHEMA.TABLES, to dynamically generate SSIS packages can be very
useful(See
my previous post). But what can be
even more useful is to create your own metadata tables which can be used as the
drivers for packages created via BIML Script.
Scenario
Let’s say that you have a scenario where you know that you
are going to have to continue making a large number of SSIS packages in the
future that will perform similar, but slightly different functions. Maybe they all use different source and
destination tables with different queries, but also some of them have a
transform or two while others don’t. You
know that in the future you’re going to have to make countless more of these
packages for different feeds, but you don’t know the source queries or which
exact transforms each one will need.
Setup
Let’s assume we currently know of three tables we need to
populate, one of which needs a derived column.
In the future, the other two might also need this derived column, and we
know that more tables will follow this same pattern.
We’ll start with 3 source tables and 3 destination tables
with the “Dest” suffix. The destination
tables have a couple column differences from the source tables. We also have a table called BIMLMetaData
which will function as the driver for our package generation similar to how we
used INFORMATION_SCHEMA.TABLES in the previous post’s example.
You
can download the CREATE script here.
The script assumes an existing db called Sandbox, but change it as you
like. It will also insert a few records
into the source tables and into our meta data table.
Custom Metadata Table
So now we should have 7 tables - Employee, EmployeeDest,
Product, ProductDest, Store, StoreDest, and BIMLMetaData. In BIMLMetaData, each row we will store the
information specific to each package which will be combined with the common
elements of all the packages that we will create in our BIML file. In BIMLMetaData we have:
- Id – simple Identity column
- SourceTableName – The name of our source table
- TargetTableName – The name of our target table
- SourceSQLQuery – The full SQL query that we want
to use as the source for our package
- NeedsTransform – A bit field to determine if
this package requires a transform (in this case a derived column)
- IsActive – A bit field to determine if we need
to generate a package for this row
Three rows should have been inserted into BIMLMetaData from
the script above, one for Product, Employee, and Store. Note that they are all set to IsActive = 1
and Employee is set to NeedsTransform = 1.
The SQLSourceQuery values are just simple select statements which will
be used in our packages.
BIML Code
First in the BIML code we will set up our connections and
some variables within C#. This is very
similar to my previous
post, except you will notice that
highlighted in blue is a query against our BIMLMetaData table instead of
INFORMATION_SCHEMA tables as in our previous example. Also, as in that example, we are going to
begin a foreach loop over each row in
the result set ‘MyDataTable’. In the
query we also add a WHERE clause for IsActive = 1, this would allow us to “turn
off” BIML regenerating any packages we don’t it to.
<Biml
xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="SandBox" ConnectionString="Provider=SQLNCLI11.1; Data
Source=localhost;Initial Catalog=Sandbox;Persist Security Info=True;Auto
Translate=False; Integrated Security=SSPI" />
</Connections>
<Packages>
<#
//Now we are in C#!
string
MyConnectionString ="Provider=SQLNCLI11.1; Data Source=localhost;Initial
Catalog=Sandbox;Persist Security Info=True;Auto Translate=False; Integrated
Security=SSPI";
string
TargetTableName;
string
TableSchema;
string
SourceTableName;
int
NeedsTransform;
string SourceSQLQuery;
DataTable
MyDataTable;
//Populate
Data Table with values from BIMLMetaData where IsActive =1
MyDataTable
= ExternalDataAccess.GetDataTable(MyConnectionString,"SELECT
m.SourceTableName,m.TargetTableName,m.NeedsTransform,m.SourceSQLQuery FROM
dbo.BIMLMetaData m WHERE m.IsActive = 1");
/*Loop
over each target table row in MyDataTable */
foreach(DataRow
row in MyDataTable.Rows){
TargetTableName
= row["TargetTableName"].ToString();
TableSchema
= "dbo";
SourceTableName
= row["SourceTableName"].ToString();
NeedsTransform
= Convert.ToInt32(row["NeedsTransform"]);
SourceSQLQuery =
row["SourceSQLQuery"].ToString();
#>
The following code snippets are contained within the above foreach loop. For every record in the BIMLMetaData table we
are going to create 1 package, named based on the source table name, count the
existing rows in the target table (*Note: we don’t do anything with this count
in this demo), and populate the OleDBSource in the Data Flow with the
SourceSQLQuery from our meta data table.
<Package Name="Extract <#=SourceTableName#>"
ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey">
<!--ConstraintMode connects the
tasks in order-->
<Tasks>
<ExecuteSQL Name="Get initial count" ConnectionName="SandBox">
<DirectInput>
SELECT
COUNT(1) AS Total FROM <#=TargetTableName#>
</DirectInput>
</ExecuteSQL>
<Dataflow Name ="<#=SourceTableName#> Data Flow">
<Transformations>
<OleDbSource ConnectionName="SandBox" Name="<#=SourceTableName#> Source" >
<DirectInput>
<#=SourceSQLQuery#> <!--Use SourceSQLQuery column
from BIMLMetaData to populate the OleDbSource direct input query-->
</DirectInput>
</OleDbSource>
Next, still contained within the foreach loop, we add another nugget of C# code, this time with an if control statement. Using the value in the BIMLMetaData.NeedsTransform
column for each table, we can have BIML generate components or logic based on
our settings in the meta data table.
<# if (NeedsTransform ==1)
{#>
<DerivedColumns Name="Transform <#=SourceTableName#> Data">
<Columns>
<Column Name="CreateDate" DataType="DateTime" > <!--Can do multiple derived
columns. Values are Expressions-->
GETDATE()
</Column>
</Columns>
</DerivedColumns>
<#}#>
Here I’m just adding a derived column with a getdate(); but
you could use this technique for much more interesting things, such as which
type of Slowly Changing Dimension logic to apply. With this type of a setup, you can configure
options and regenerate packages within minutes if a component is added or
removed.
Lastly, we just add the OleDB Destination, as I have covered
in previous posts, along with our directives at the bottom.
<OleDbDestination Name="Insert into Destination" ConnectionName="SandBox">
<ExternalTableOutput Table="<#=TableSchema#>.<#=TargetTableName#>"></ExternalTableOutput>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<#}#> <!--End for each MyDataTable loop-->
</Packages>
</Biml>
<!--Directives:-->
<#@ template language ="C#" tier ="2" #>
<#@ import namespace="System.Data" #>
<#@
import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
Results
Once we have created our BIML file (full code
sample available here), we right click and Generate SSIS Packages and
should get one package per record in the BIMLMetaData table.
What you should notice is that of the three packages
generated, only the ‘Extract Employee’ package has our derived column component
since it was the only record in BIMLMetaData to have NeedsTransform = 1.
Summary
In this demo we took a look at how you can create your own
meta data tables to use in conjunction with BIML to build flexible and
configurable SSIS frameworks. Here we
only used a few flags to generate packages with or without a derived column
task, but the concept could easily be applied to much more complex logic – such
as SCD Type of dimensions or what behavior updated Fact table data should use. The thing to keep in mind is that BIML should
just be “one tool in your toolbox”. You
want to be careful not to build a very flexible, yet overly complex, set of
BIML files when it would be simpler to create a package the old fashioned way.
Sample Files
Erik
Labels: BIML, Dynamic, ETL, SSIS