The Bulk Insert Task in SSIS is a tricky thing. It often creates a lot of confusion and frustration before it is willing to insert any rows into the database. This post will explain how it works differently in SSIS and how to solve the problems that many developers encounter.
The Bulk Insert task was probably included in SSIS as a means of completing a toolset with ones that SQL devs are already familiar with. However, this is the one task in SSIS that operates in a different context. Let's suppose that we have a dedicated SSIS server which receives files and needs to insert their data into another server which houses the SQL database.
You start by adding a Bulk Insert task to your package and fiddle with it locally until you can get it to execute. Then you deploy the package to a testing environment that looks like production (separate SSIS and db servers). You run the package and get this error:
[Bulk Insert Task] Error: An error occurred with the following error message: "Cannot bulk load. The file "c:\myfile.csv" does not exist.".
You check on the SSIS server and confirm that the file is there, so what is going wrong with this package? Let's look at the context that these tasks are executed:
The package running on the SSIS Server connects to the DB server and executes a BULK INSERT T-SQL statement - no different than if you ran this as an Execute SQL task or ran it manually on the db server in SSMS. When the database looks for the file on the C drive, it is actually looking on the local DB Server. And, of course, it finds nothing because the file is on the SSIS server.
Well, that's an easy fix. So now you create a fileshare on the SSIS Server and point the package to that. But when you run it again, you get this error:
[Bulk Insert Task] Error: An error occurred with the following error message: "Cannot bulk load because the file "\\SSIS_Server\Import\myfile.txt" could not be opened. Operating system error code 5(Access is denied.).".
You see that the file exists and that the your SSIS proxy account has permissions to it. But just in case, you open up access to the share to everyone. Still, you get the same error. To understand what went wrong, let's look at the new context with which we are executing:
SSIS makes the Bulk Insert request in the context of the account it is running under. When the database goes to access that share, however, the server needs to delegate those permissions. If kerberos is not properly configured, then the trust relationship breaks and the server tries to access the share anonymously, and so we have an access denied error.
There could be other reasons why access is denied. Suppose that you are accessing the server using a SQL login. The SQL login cannot be used to access the file system so the SQL Service account is used instead. This means that the folder needs to be granted permissions to the SQL Service account on the database server as well. In this case, it's no kerberos issue at all, the SQL login eliminates the double hop.
Let's suppose now that we have sorted out the kerberos problems (delegated trust on the server, created the appropriate SPNs, sacrificed a goat) or we have granted permission to the SQL Service account in the case of using a SQL Login. We must also insure that we granted Administer Bulk Operations to our proxy account as well.
Of course, there is another way. We could just ditch the Bulk Insert task and use a data flow. The data flow reads the flat file in the context of the SSIS server, there are no mysteries to unravel here. It also performs a Bulk Insert operation if you are using fast load with the destination component. I have actually made it perform better as well, so you know my preference :)
Labels: Bulk Insert, SSIS