Having an SSIS Component
“Wait” Until a Condition is met or a Time Period Passes
*If you just want the best solution and not the discussion, scroll to the bottom
Background
For a recent project, my colleague (@craigblevin) and I found ourselves reloading
a QA environment of a Data Warehouse from scratch. In this case, our source data consisted of
all flat files – upwards of 70 gigs worth.
In normal operation, our ETL easily handled the incremental data, but we
found that reloading all of the data at once choked up our minimally resourced
SQL Server. So we broke the files up
into more manageable chunks and ran them one at a time (in this case going back
to modify our DW ETLs themselves to process one file at a time was not an expedient
option). After spending the better part
of a weekend taking turns checking on the progress, manually moving a file, and
starting the SQL job again, we realized we could build another SSIS package to
take care of this for us. But how to
make sure the previous run was complete before copying over the next file and
restarting the job?
Set up a For Each
File Loop
The first step was to set up a simple For Each File Loop
in SSIS. In this case, as depicted
below, we wanted to move 1 file at a time from our staging directory into our
processing folder.
Once the file was in the processing folder, our package
fired the SQL Job to start the deployed SSIS packages to load the DW.
Check for the job to
finish
Once the deployed job was running, we had to know when it
was completed so the package could start the next file sequentially. For this we created a For Loop which
contained an Execute SQL Task which checked our custom DW Audit tables to see
if the job was completed. The SQL Task
simply set variable @IsLoadComplete == 1 when the last run was complete, and
the For Loop continued until @IsLoadComplete = 1.
This led to a problem though – the Loop Until Complete
container would execute the SQL Task continuously
until the loop condition was met.
We did not want to constantly query the server tens of thousands of
times until the job was complete.
Have SSIS wait a
specified period of time
Since SSIS does not have a native “Wait” task, our quick solution
was to create a nested For Loop that does nothing with an evaluation
expression that checks the current time until its 10 minutes after the start
time of the container. This causes the “Check”
SQL statement to run only once per 10 minutes.
EvalExpression: DATEADD("mi",
10, @[System::ContainerStartTime]) > GETDATE()
The finished product is below (ignore the warning on the
send mail task).
How does it perform?
In our case, since we were just loading a QA environment and
wanted a quick and dirty solution, we were not concerned with performance. We successfully allowed this design to run
in VS on our local medium powered machine overnight while processing multiple
files without issue. However, a quick
check of CPU usage did show a 30% spike in CPU usage – so in the end this
solution won’t win any performance awards.
In hindsight, was
there a better way? (hint: yes)
While writing this post, and checking the CPU usage, I
started thinking there had to be a more efficient way. Since I am not a programmer by nature, I went
straight for available SSIS components instead of opening up a script
task. But a quick search for “SSIS
Script Task Wait” found several examples (such as Mike Davis) of doing this with
a Script Task – and sure enough using a
Script Task with code such as below, performs the same function with much less
CPU usage.
Public Sub Main()
'
Dim sec
As Double = Convert.ToDouble(Dts.Variables("intDelayTime").Value)
Dim ms
As Int32 = Convert.ToInt32(sec * 1000)
System.Threading.Thread.Sleep(ms)
'
Dts.TaskResult =
ScriptResults.Success
End Sub
So if we had to do it again, replacing the “Do Nothing” For
Loop with a Script Task such as the above would probably be less resource
intensive and accomplish the same thing!
Labels: DW, Script Task, SSIS