Integration Services, the 2005 replacement of good old Data Transformation services, has so many control flow and data flow possibilities, that you often have more than one way to do a particular task. One of those things is inserting a bunch of data from a CSV file into sql server. I started some experiments on this, and the first results (please wait for more detailed results before we can draw reliable conclusions) show that Bulk Insert is not the fastest way to bulk insert data!
The setup: I have one file of 272 Mb (1 million records) of random data: 5 columns varchar(49), and 5 columns of integer data. I insert these in an empty table in SQL Server 2005 developer edition. There are no indexes nor constraints on this table. I try this insert in a number of ways:
- With the bulk insert control flow task
- With a data flow with a flat file source and
- an OleDb destination (default settings)
- an OleDb destination using the fast load option
- a SQL Server destination
All experiments where run 3 times, all on the same machine (the IS package is run at the same machine where SQL Server is running, which is a Pentium 4, 3 GHz, 2 Gb ram, 1 processor machine (heavy desktop)). Below, I show you the average execution time for each of these approaches:
- Bulk insert : 49 seconds
- OleDb destination: 460 seconds
- OleDb destination fast load: 39 seconds
- SQL server destination: 24 seconds
It's obvious that the OleDb destination is slower than the bulk insert, but I have to admit I was surprised to see both the fast load OleDb and the SQL Server destination to be faster (and for the SQL destination it is even about 2 times faster!) than the bulk insert. I'm going to run some more tests against this to see why this unexpected behaviour pops up.
Nico