High-Performance Techniques for Importing CSV to SQL Server using PowerShell

If you’ve ever tried to use PowerShell’s Import-CSV with large files, you know that it can exhaust all of your RAM. Previously, I created a script on ScriptCenter that used an alternative technique to import large CSV files, and even imported them into SQL Server at a rate of 1.7 million rows a minute. I’ve now gotten that number up to a massive 3.2 million rows a minute 5.35 million rows a minute for non-indexed tables and 2.6 million rows a minute 4.35 million rows a minute for tables with clustered indexes.

Importing into SQL Server

Two of the fastest ways to import data is to use bcp or SqlBulkCopy. BCP is finicky, and requires a separate download. My intention is to create a script that can be used on any machine running PowerShell v3 and above, so these scripts will be using SqlBulkCopy. The catch is that SqlBulkCopy.WriteToServer() requires a Datatable, Datarow, or IDataReader as input. This means your plain-text CSV data must be transformed into one of these objects prior to being passed to WriteToServer().

I’ve tested three different methods for transforming CSV data to a consumable object. The first method uses SqlBulkCopy, StreamReader and DataSet batches. The second streams an OleDbconnection query result set to SqlBulkCopy. The third uses SqlBulkCopy, VisualBasic.FileIO.TextFieldParser and DataSet batches. I actually tested logparser, too, but while logparser is amazing (and way way faster) for parsing files, it doesn’t take advantage of sqlbulkcopy, so these approaches actually outperform logparser (and my tests with SSIS for that matter).



My lab is a VMWare virtualized lab which now consists of a Windows 10 machine with 6GB RAM, and a SQL Server 2014 instance with 12GB RAM (8 dedicated to SQL Server). Both are on on the same ESX host which sports a directly connected Samsung 850 EVO 500GB SSD drive. The database’s recovery model is set to SIMPLE.

The 1.2 GB CSV in this example contains 10 million+ rows of tab-delimited longitude and latitude data. A zip containing the CSV is available for download at geonames.org.

First, I had to create the SQL table.

Then I tested the various methods. You’ll probably notice that these scripts don’t have much error handling. I wanted to keep the code easy to read and straightforward. Finalized scripts with error handling and all that will be placed on ScriptCenter shortly.

StreamReader to DataTable batches

This one is by far the fastest, at up to 5.35 million rows per minute on a non-indexed table.


Realistically, you’ll probably be importing to an indexed table, which performs up to 4.35 million rows per minute on optimized systems.


Basically, the script performs the following

  1. Creates the SQL Bulk Copy connection
  2. Creates the PowerShell datatable, along with its columns
  3. Reads the CSV file using System.IO.StreamReader
  4. Using readline(), loads the CSV data row by row into the datatable
  5. Performs the bulk import every x number of rows
  6. Empties the datatable
  7. Rinse repeat starting at step 4 until the end of the CSV file

In order to avoid having to specify column names each time you import a new CSV, the SQL table column order and CSV column order must be the same. If you need to rearrange your columns, I suggest creating a view and importing to the view.

If you’re wondering why batching was used, it’s because the datatable seems to get exponentially slower as it grows above a few hundred thousand rows. I actually wasn’t able to import all 9 million rows into the datatable before my memory was exhausted. I tested various batch sizes, and 50k worked best for me.

The downside to using this script is that it doesn’t handle embedded delimiters well. So if you have a comma delimiter, and your data contains “This Works”,”This, doesn’t”, then it will fail on “This, doesn’t.” You can address this with a regex (as seen below, after the code).

The Script

As I mentioned earlier, you can use regex if you have delimiters embedded in quotes. This slows down the script by a fair degree, but here’s some code to get you started.

Streaming a 32-bit OleDbconnection

What I like about this method is that it’s still high performance (~2million rows/minute) and you can select subsets of data from CSV if you wish. Just change the CommandText to have a where clause, for example: $cmd.CommandText = “SELECT count(*) FROM [$tablename] WHERE Category=’Cajun'”. If your CSV doesn’t have column names, you can use the built-in column names F1, F2, F3, etc.

Here are the results from a million row subset:

The downside to this script is that you have to use an x86 shell, but because it doesn’t use any RAM, the performance actually isn’t impacted. Switching to another shell may just get a little complex if you embed this script in a function. I’ve mitigated this to the best of my ability by telling it to call the 32-bit shell with the current scriptname and arguments.

The Script

VisualBasic.FileIO.TextFieldParser to DataTable batches

This technique is the slowest, but if you’ve got embedded delimiters and need to stay in x64, you may want to consider it.


The Script

VB.NET Equivalent

Using a VB.NET compiled exe imports about 4 million rows a minute.


There are a number of tools that can be used to import CSV files into SQL Server. If you’re looking to use PowerShell, the StreamReader will give you the fastest results, but if your code has embedded delimiters, this may not be the method for you. Using OleDBConnection is a powerful, fast way to import entire CSVs or subsets of CSV data, but requires the use of an x86 shell, unless you download the Microsoft Access Database Engine 2010 Redistributable which provides a 64-bit text driver. Microsoft.VisualBasic.FileIO.TextFieldParser is slower, but works in 64-bit and processes embedded delimiters well.

Chrissy is a PowerShell MVP who has worked in IT for nearly 20 years, and currently serves as a Sr. Database Engineer in Belgium. Always an avid scripter, she attended the Monad session at Microsoft’s Professional Developers Conference in Los Angeles back in 2005 and has worked and played with PowerShell ever since. Chrissy is currently pursuing an MS in Systems Engineering at Regis University and helps maintain RealCajunRecipes.com in her spare time. She holds a number of certifications, including those relating to SQL Server, SuSE Linux, SharePoint and network security. She recently became co-lead of the SQL PASS PowerShell Virtual Chapter. You can follow her on Twitter at @cl.

Posted in PowerShell, SQL Server
26 comments on “High-Performance Techniques for Importing CSV to SQL Server using PowerShell
  1. Eric Bouchard says:

    Hi Chrissy.

    Trying to have your StreadReader to DataTable batches example working.

    I’ve got conversion error between string to int (and decimal) when calling WriteToServer.

    Since every columns in the DataTable is of type String, what is wrong ?

    In the example in VB.NET, columns types are provided, so everything works fine.

    What am I missing ?

    • Chrissy LeMaire says:

      Hey Eric,
      That is strange. I’ve had luck with int and decimal. Are implicit conversions allowed on your database?

      Try removing this part:

      foreach ($column in $columns) {
      $null = $datatable.Columns.Add()

      And then explicitly specify your column names. For example:


      • Eric Bouchard says:

        Thanks for your help.

        What do you mean by “Are implicit conversions allowed on your database?”

        How can I do that in SQL Server ?

        I already use the solution you provided. But I still have another problem.

        Parsing my CSV, with empty values (empty strings) that are NULL values for Integer (for example), I received an error message that I cannot load an empty string to an Int32 (or something like that).

        So I have to convert my empty values to System.DBNULL.Value for those cases. So it slow the overall process.

  2. Paul says:

    Hello Chrissy,

    I found your post and I set your script and it works great. One question is when I run this script lets nightly new entry in the SQL database. Is these why I can just have it append the DB entries or overwrite them?


  3. Praveen says:

    Hey Chrissy,

    Thanks a lot for the awesome script.
    The script works perfectly if the table does not have an IDENTITY Column.

    I have a column [UniqueID] [bigint] IDENTITY(1,1) NOT NULL,
    Is there any way to insert data into a SQL Table with IDENTITY COlumn using the bulkCopy method?


    • Praveen says:

      I’ve figured it out..!!
      The script actually works for identity columns as well..
      The catch is that we need to have the columns in the same order in csv and sql table and in the powershell datatable.
      I have populated the identity column in csv with dummy values and executed the script.
      it worked!! The Sql table ignored dummy values of datatable and inserted the values in Identity Column from where they ended earlier.

  4. mah says:

    how can i ignore or import the row that have extra separator(comma)?
    thanks for your attention

    • Peter says:


      If you are using VB.Net you can use Lumenwork’s free opensource CSV reader instead of streamreader (with the split command). I had to parse the fields back into a string() using the following function:

      Public Shared Function GetStringArray(iNumFields As Integer, crCsvReader As CsvReader) As String()
      Dim saValues(iNumFields – 1) As String
      For i As Integer = 0 To iNumFields – 1
      saValues(i) = crCsvReader(i)
      Return saValues
      End Function

      I then fed this to the datatable.

  5. Sethu Vaithiyanathan says:

    Hi Chrissy,

    Thanks for the excellent script. But just curious to know, if there is an equivalent PS script to copy data of a table in 2 different SQL instances. At present, we tried the SSIS Data flow task and the speed is not on par. Please let me know if there is any modified version of this script to use this as an alternative for SQL import and export. Thanks alot!!


    • Chrissy LeMaire says:

      Actually, yes, I’ve done this a bunch of times before and I plan on making it part of the dbatools module.

      You’d basically do a $bulkcopy.WriteToServer($sqlcmd.ExecuteReader())

  6. Peter says:


    Very nice.

    I’m sure that you already know that the speed depends on the number of columns as well as the number of rows. These performance numbers are both interesting and amazing.

    I can’t wait to try this with a very large CSV file. I specifically need to create a very generic CSV importer to SQL that creates an SQL table with it’s field names then pulls the data into it. Your code makes the import process very quick. I think using it with the lumenworks CSV reader which will validate the data will allow for this utility to work very well.

    Thanks again,


    • Chrissy LeMaire says:

      Peter, great point! Maybe i’ll include LumenWorks in my official module. I hate it right now because it doesn’t handle stuff well. This was mostly for proof of concept, but yeah, SUPER FAST = No error handling, which is fun, but not long term useful.

      I came up with a way to do it even faster (25 million rows in less than 2 minutes using multithreading) but the garbage collection doesn’t collect fast enough so it suffers from memory issues.

  7. DumbGeek says:

    As someone described above, if there is an identity column, the script might fail. To fix it:

    $bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)

  8. JD says:

    Hi there,
    $null = $datatable.Rows.Add($([regex]::Split($line, $csvSplit, $regexOptions))) inserts double quotes into the table. Is there a way we could avoid it from happening?

    • Chrissy LeMaire says:

      Hey JD, I need to revamp this article. I don’t recommend these techniques for most data. It turns out the two datasets I was working with were nice and perfect and anything outside of that is subject to breakage. The second option (OLE) in this case will remove the quotes.

      Btw, where did you find the site? There seems to be more comments than usual.

      If you guys wanna see what I did with runspaces, I gave a talk at PSCONF.eu. My slides are stacked with code and stuff.


      • JD says:

        Yep, I’ll take a look at it. BTW, Is there a recorded session for psconfeu? It would be great if it was out there. :)

        • Chrissy LeMaire says:

          Waiting! It was recorded but they are doing the edits still. I’ll post here when it’s out :D

          • JD says:

            Awesome. Thanks! Meanwhile, let me try to figure out runspaces. I’m sure there is a steep learning curve there.

  9. mbourgon says:

    I’m wondering if it would be possible to make a “matching importer”, where it queries a table for its field list, compares to the header row, then creates a smaller $datatable with just the matching fields, and then insert them. That way if there’s an extra field in the second file sometimes, then it would ignore it.

    I got rather far into it but I think I’m stuck – I get all the way up to inserting into the datatable. At that point, because it’s a simple insert, there’s no way to tell it “just do these fields”. I was hoping that $datatable.Rows.Add($line.Split($csvdelimiter)) would realize that some fields don’t exist, and wouldn’t add them. Does not appear so.

    #adapted from https://blog.netnerds.net/2015/01/powershell-high-performance-techniques-for-importing-csv-to-sql-server/

    . C:\powershell_scripts\Join-Object.ps1
    . C:\powershell_scripts\invoke-sqlcmd2.ps1
    $csvdelimiter = “|”
    $csvfile = “c:\temp\myfile.txt”
    # 50k worked fastest and kept memory usage to a minimum
    $batchsize = 50000

    remove-variable header_columns
    remove-variable list_of_fields

    $list_of_fields = invoke-sqlcmd2 -query “SELECT name FROM sys.columns WHERE OBJECT_ID = OBJECT_ID(‘view_that_i_insert_into’) ORDER BY column_id;” -serverinstance “myserver” -Database mydb

    #the whole purpose of the select-string and pattern is to get it back into an object array for the join-object
    [regex]$pattern = “[^|]” #using the pipe because I know it won’t be in there.
    $header_columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)|select-string -pattern $pattern

    #external module by powershell team. Gives us our inner join, and now we just get the fields that match.
    $fields_to_import = join-object -left $header_columns -Right $list_of_fields -Where {$args[0].Line -eq $args[1].name} -leftproperties Line -rightproperties name -type OnlyIfInBoth |select Line

    remove-variable datatable
    $datatable = New-Object System.Data.DataTable

    # creates the fields in the datatable to be populated with. It will have no rows, though
    # This is the target of the join-object.
    foreach ($column in $fields_to_import) {
    $null = $datatable.Columns.Add()

    # Open the text file from disk – doing this here because we did it with get-content earlier. If it’s slow, use that
    $reader = New-Object System.IO.StreamReader($csvfile)

    if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }

    # Read in the data, line by line
    #alas, this doesn’t look like it’ll work; it grabs the entire string and inserts it into the datatable. Which won’t work since the datatable is just “what matches”.
    while (($line = $reader.ReadLine()) -ne $null) {
    $null = $datatable.Rows.Add($line.Split($csvdelimiter))
    $i++; if (($i % $batchsize) -eq 0) {
    Write-Host “$i rows have been inserted in $($elapsed.Elapsed.ToString()).”

  10. corey says:

    So… any way to turn the process around, and write data to text files quickly?

    SSIS is OK, but…

  11. Chris Ferry says:

    Hi Chrissy,

    Great Script. I am wondering if the StreamReader “split” method expects a Delimiter to appear after the last column record for a given line.

    I was able to successfully run the script for one set of data which had the delimiter after the final column record, but was unable to do the same when the data had no comma after the last record.

    Example 1 (Works)
    John Smith, [email protected], 12345678,
    Jane Smith, [email protected], 987654321

    Example 2 (Does Not Work)
    John Smith, [email protected], 123456789
    Jane Smith, [email protected], 987654321

    It is possible I am breaking at another point, but the error I am getting states “Additional information: Input array is longer than the number of columns in this table.”

    Anyway, I have been trying to get it to run over a few specific file formats, if you happen to know if the method expects the end delimiter, let me know as I have been wrestling with it for a while.

    Thanks again for the great solution.

  12. Anderson says:


    Interesting script but any way to export millions of events. I´m trying to export more tan 7 millions of events and ONLY get time out and high memory consumption.

    It must be an option to optimize the export when you are using a variable to store all events.

    Regards and thanks

  13. Melinda says:

    I have a TAB-DELIMITED .txt file and I’m getting the following errors:
    Exception calling “WriteToServer” with “1” argument(s): “The given value of type String from the data source cannot be converted to type bigint of the
    specified target column.”
    At C:\Users\MELINDA.SAVOY\MyDirectory\TEST_CopyToCFLE_table.ps1:60 char:9
    + $bulkcopy.WriteToServer($datatable)
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : InvalidOperationException

    100000 rows have been inserted in 00:00:35.3103642.
    Exception calling “WriteToServer” with “1” argument(s): “The given value of type String from the data source cannot be converted to type decimal of the
    specified target column.”
    At C:\Users\MELINDA.SAVOY\MyDirectory\TEST_CopyToCFLE_table.ps1:60 char:9
    + $bulkcopy.WriteToServer($datatable)
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : InvalidOperationException

    The same data types are defined in my data source (postgresql) as they are in the table of my sql server table. I’m not sure how to correct. Any help/direction would be appreciated. Thanks.

    • Anderson says:

      Thansk for your response, but the point that I am trying to highlight is about exporting data (event logs) to a CSV in a faster way. Because as you Said RAM consumption is a concern. It’s not recommended using a variable, because RAM consumption, instead you can export to a CSV but process is slow. You explanation is focused to import data to SQL and that is not my concern.

  14. Michiel says:

    It’s brilliant, i was looking for a method to import 623k zipcodes from a csv to a databasetable, and the powershellscript works better than expected. I decreased the batch size to 500, just for fun actually and to see some more lines passing through, it took only 14 seconds. The previous import with a regular .sql file with insert statements took 15 minutes.
    So thank you =)

Leave a Reply

Your email address will not be published. Required fields are marked *