Business Intelligence Markup Language, or BIML, is a dialect
of XML that describes Business Intelligence objects such as ETL Packages,
Cubes, and even Facts and Dimensions (here we will focus mainly on ETL
Packages). This post will serve as an
intro to this very powerful (and mostly free!) BI development tool. In this post we will just create a singular
SSIS package, but once we dive deeper into BIML’s capabilities we will be able
to build dynamic and automated ETL design patterns. I hope to turn this into a series of posts on
BIML.
Unless otherwise noted, we will assume SQL Server 2014 and
VS 2012, although much of this applies at least back to SQL Server 2008.
Getting Started
First, in order to use BIML, you will need a tool to
translate the actual BIML code into the desired BI object. There are two main options for compiling BIML
code – the free Visual Studio add-on, BIDS Helper, and the proprietary IDE,
Mist, by Varigence. We will focus on FREE :)
·
Assuming you already have SQL Server and Visual
Studio installed, download and install the awesome VS add-in BIDS Helper
(while there, check out the other great features it adds).
·
Once BIDS Helper is installed, if you start a
new SSIS project, you will now have a new option when you right click on the
project in the solution explorer
Now you will get a new file called BimlScript.biml under the
“Miscellaneous” folder in the solution explorer. If you open the file, you’ll see the XML root
node of your new BIML file.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
</Biml>
Before going any further, lets set up our demo tables, Employee and EmployeeDest –
both with identical columns - and insert a few rows.
/****** Object: Table [dbo].[Employee] Script Date: 11/1/2015 4:35:35 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employee](
[EmployeeId]
[int] IDENTITY(1,1) NOT NULL,
[FirstName]
[nvarchar](50) NULL,
[LastName]
[nvarchar](100)
NULL,
[Notes]
[nvarchar](255)
NULL,
[CreateDate]
[datetime] NULL,
CONSTRAINT [PK_Employee] PRIMARY
KEY CLUSTERED
(
[EmployeeId]
ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[EmployeeDest] Script Date: 11/1/2015 4:35:43 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[EmployeeDest](
[EmployeeId]
[int] IDENTITY(1,1) NOT NULL,
[FirstName]
[nvarchar](50) NULL,
[LastName]
[nvarchar](100)
NULL,
[Notes]
[nvarchar](255)
NULL,
[CreateDate]
[datetime] NULL,
CONSTRAINT [PK_EmployeeDest] PRIMARY
KEY CLUSTERED
(
[EmployeeId]
ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Employee]
([FirstName]
,[LastName])
VALUES
('Erik','H.'),
('Jeff','K.'),
('Ed','B.')
GO
Generate a Simple
Package
As you would with an SSIS package made via the standard UI,
we will start by setting up our connection managers. We do this with a <Connections> and
<Connection> tag as seen below. As
you can see, thanks to BIDS Helper, we also have some intellisense to help us
along.
<Connections>
<Connection Name="MyConnection" ConnectionString="Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;"/>
</Connections>
With our connection defined, we now want to add the BIML to
generate our actual package(s). As with
connections, we start with a <Packages> node, then <Package>, and
within <Package> we have <Tasks>.
The <Tasks> tag loosely translates to the Control Flow tab of an
SSIS package in the UI. So all of the
nodes underneath a task, such as ExecuteSQL as shown below, would be Control
Flow objects.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="MyConnection" ConnectionString="Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;"/>
</Connections>
<Packages>
<Package Name="MyFirstPackage" ConstraintMode="Linear"> <!--Name and
attributes of package. Linear tells BIML to execute the components in sequence.-->
<Tasks> <!--"Tasks"
are roughly equivalent to components on the Control Flow-->
<ExecuteSQL Name="Select Employees" ConnectionName="MyConnection"> <!--This
would be on the Control Flow-->
<DirectInput>
SELECT COUNT(1) AS TotalEmployees
FROM Employee
</DirectInput>
</ExecuteSQL>
</Tasks>
</Package>
</Packages>
</Biml>
So if we take the above code and put it in our
BimlScript.biml file (and assuming we have a DB named Sandbox and the sample
tables created), we can then generate our first package. Right click on the BIML file and select “Generate
SSIS Packages” as shown below.
Assuming no errors, you should now have an SSIS package
called MyFirstPackage.dtsx which looks like the below:
Simple, and useless!
Adding a Data Flow
Now let’s add some Data Flow components to our BIML
file. As you might expect, within the
<Tasks> tag, we now would add a <DataFlow> tag to add a Data Flow
to our SSIS Control Flow. In BIML, all
of the components within a Data Flow are called “Transformations” and thus go
in… a <Transformations> tag. Here’s
a quick review:
In order to add a Data Flow and an OleDbSource to our
package, within the <Tasks> tag, we add
<Dataflow Name="Extract Employee Data">
<Transformations>
<OleDbSource Name="Employee Source" ConnectionName="MyConnection">
<DirectInput>
SELECT [EmployeeId]
,[FirstName]
,[LastName]
,[Notes]
,[CreateDate]
FROM [Sandbox].[dbo].[Employee]
</DirectInput>
</OleDbSource>
</Transformations>
</Dataflow>
after our <ExecuteSQL> tag. You’ll notice that within the <OleDbSource>
node there is another node for <DirectInput>. This is exactly what would go in the OLE DB
Source Editor SQL Command box in the SSIS UI.
This is a good point to note that, as with all XML, BIML IS CASE
SENSITIVE.
We could generate the package again at this point, and
successfully re-create the MyFirstPackage.dtsx package (note that BIDS Helper
helpfully will ask you if you want to overwrite an existing package when you
re-generate it), but it wouldn’t do much yet.
Add a Derived Column
and OLEDB Destination
For the last part of this demo, lets add a simple Derived
Column transform and then our destination.
So after our </OleDbSource> tag, we can add a
<DerivedColumn> tag. As you can
see from the screenshot, the intellisense lists the wide variety of components
that you can use with BIML.
Within the <DerivedColumns> tag, we can have
<Columns> and then each individual <Column> which contains the
value that would go into the Derived Column component’s Expression box in the SSIS UI. You can add multiple columns in one
<DerivedColumns> tag. Also note
that in this case I set it to ReplaceExisting = “true”, just as you can in the
UI.
At this point, our full BimlScript file should look like
this
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="MyConnection" ConnectionString="Data Source=localhost;Initial Catalog=Sandbox;
Provider=SQLNCLI11.1; Integrated Security = SSPI;"/>
</Connections>
<Packages>
<Package Name="MyFirstPackage" ConstraintMode="Linear"> <!--Name and
attributes of package-->
<Tasks> <!--"Tasks"
are roughly equivalent to components on the Control Flow-->
<ExecuteSQL Name="Count Employees" ConnectionName="MyConnection"> <!--This
would be on the Control Flow-->
<DirectInput>
SELECT COUNT(1) AS TotalEmployees
FROM Employee
</DirectInput>
</ExecuteSQL>
<Dataflow Name="Extract Employee Data">
<Transformations>
<OleDbSource Name="Employee Source" ConnectionName="MyConnection">
<DirectInput>
SELECT [EmployeeId]
,[FirstName]
,[LastName]
,[Notes]
,[CreateDate]
FROM [Sandbox].[dbo].[Employee]
</DirectInput>
</OleDbSource>
<DerivedColumns Name="Set Columns">
<Columns>
<Column Name="CreateDate" DataType="DateTime" ReplaceExisting="true"> <!--The value inside the <Column> tags is what would be
in the Expression box on a Derived Column component-->
GETDATE()
</Column>
<Column Name="Notes" DataType="String" ReplaceExisting="true">
"Added via ETL"
</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="Insert Employees" ConnectionName="MyConnection">
<ExternalTableOutput Table="dbo.EmployeeDest"></ExternalTableOutput>
<!--Destination target table-->
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
And if we re-generate the package, and run it, hopefully it’s
all green for you also.
Summary and Next
Steps
Obviously this post just scratches the surface of BIML and
doesn’t even begin to touch on its applications for dynamic automated
development. But like with everything,
with BIML it’s good to start small and build on what you’ve learned. In this post we covered the basic syntax of
BIML, and created a very simple SSIS package to move data from one table to
another with a simple transformation.
One of the best things about BIML is that packages created with it are identical
to any package created in the SSIS UI – other than maybe straighter lines. If you generate packages via BIML, a future developer
does not need to know a thing about BIML to do maintenance work on your BIML
generated package.
In future posts, I will cover more components and uses for
BIML and the added power that BIMLScript - .Net code embedded within BIML – gives us to generate
true dynamic packages.
Erik
Labels: BIML, ETL, SSIS