A while back, I wrote about High-Performance Techniques for Importing CSV to SQL Server using PowerShell. Earlier today, I released a PowerShell module, CsvSqlImport.psm1, which puts that post into practice.
Using this module, it’s possible to import over 90,000 records a second in optimized environments. In addition to being speedy, it’s also straightforward to use and can be easily scheduled for ongoing imports.
Check out how quick 1,000,000 rows can be imported when the -Turbo switch is used:
Why isn’t Turbo used by default? Well there’s very little error management (that slows things down) so it may error out when other methods (default and -Safe) may not.
Why not just use the Import/Export Wizard?
Because it stopped being easy for me awhile back (and it’s purely GUI, and requires additional software and it can’t import multiple files at once, and so on.)
Remember when importing CSVs into SQL Server using dtswizard didn’t require any conversions or fanciness? You’d just select your file and your delimiter, it would create the table for you using varchar(255) across the board, and plink! You’d almost always get the “Successfully copied 1 table(s) from Flat File to Microsoft SQL Server.”
Then, the Import Wizard became more powerful but also more complicated, and quick and dirty imports suddenly took more work. For instance, whether I allow the Wizard to create a new table, or try to map to an existing one, my imports in SQL Server 2005 and above nearly always go like this:
Noooo! What is all this mapping? I just want it to work then I can mess with my datatypes later. *Next*
“Well, damn. I should just write a PowerShell function to do this for me.” And this is basically how Import-CsvToSql came about.
Eventually, I’ll have support for a GUI, too. I created one, but got too caught up in best practices, and decided to put it off while until I have time to learn more about WPF (which I love, btw.)
Like many PowerShell functions, Import-CsvtoSql supports the -First parameter. As I mentioned earlier, it can even import multiple files at once, so long as they’re similarly formatted. It also allows you to query the CSV file for subsets of data (think select address, cost from csv where State = ‘Louisiana’), and it supports all the options within SqlBulkCopy like Table Lock, Check Constraints, Fire Triggers, Keep Identity, and Keep Nulls.
Most of my imports have been pretty basic, but I did add in features that I thought others would appreciate. If you have any more you’d like me to add, let me know. Oh, also, I did my best to comment the hell out of the code, so if you’re wondering how it works, it should be easy to figure out.
Give the script a try and let me know how you like it. You only need PowerShell 3.0 and above. No additional software is required. If you run into any issues, send a portion of your CSV file to me at [email protected], and I’ll take a look to see what’s going on.