Find Duplicates in PowerShell Datatable using LINQ

Update 1/4: Removed conversion of datatable to list. That just wasted memory, as this works on all enumerables.

I don’t have a practical application for this method, but I wanted to figure out how to do it anyway. Doing so was frustrating but kinda fun. It forced me to learn about lamdas in PowerShell which I’ll probably use at some point in the future. Nivot Ink had some straightforward syntax that helped me, along with Matt Graeber’s post on PowerShellMagazine.

Esentially, I was looking to recreate the following SQL statement in LINQ: SELECT ColumnName, COUNT(*) as DupeCount FROM TableName GROUP BY ColumnName HAVING COUNT(*) > 1. The PowerShell LINQ Equivalent is this basically:

Sample Project

In this sample project, we’ll create some fake data, then query it using LINQ. I tested this on larger data sets and found it got exponentially slower. The OleDBConnection method I wrote about earlier still smokes this technique. Then again, when it comes to .NET, I have little idea of what I’m doing, so maybe it can be optimized. I’m trying to figure out a way to query this list with CreateQuery, too.

Create Datatable with Dupes

Prep the dupe check

First we have to decide what columns to examine. In this example, duplicate artist+album constitutes a duplicate.

Populate new datatable with results and clean up

The above code was prep work, now we’re going to actually execute the query, and populate a datatable with the results. Unfortunately, there’s no good way to convert a Linq list to a Datatable so we’ll just use .Clone() to clone build the new table schema, and then perform an ImportRow.

And voila, the results


Want to see the code all in one shot? Check out the gist.

Chrissy is a PowerShell MVP who has worked in IT for nearly 20 years, and currently serves as a Sr. Database Engineer in Belgium. Always an avid scripter, she attended the Monad session at Microsoft’s Professional Developers Conference in Los Angeles back in 2005 and has worked and played with PowerShell ever since. Chrissy is currently pursuing an MS in Systems Engineering at Regis University and helps maintain in her spare time. She holds a number of certifications, including those relating to SQL Server, SuSE Linux, SharePoint and network security. She recently became co-lead of the SQL PASS PowerShell Virtual Chapter. You can follow her on Twitter at @cl.

Posted in PowerShell
4 comments on “Find Duplicates in PowerShell Datatable using LINQ
  1. phil Factor says:

    Wow! I loved that. I didn’t think it was possible to do this in PowerShell

    • Chrissy LeMaire says:

      Senpai notice me :D

      I’m a super huge fan and have used your articles extensively in learning SQL and PowerShell. Thank you! I recently released a PowerShell module that can import 25,000,000 rows from CSV in less than two minutes.

      I think you may enjoy playing. Right now, I’m trying to see if LINQ can be used at the $bulkcopy.WriteToServer().

  2. CuteDev says:

    Hi, thanks for such an awesome post, i know m looking to this post after an year ;) but many of us are not even aware that was possible through powershell

    • Chrissy LeMaire says:

      Thank you! I was just on this post again last night exploring how I’d potentially get Linq to turn a CSV into streaming IDataReader. Haven’t accomplished it yet ;)

Leave a Reply

Your email address will not be published. Required fields are marked *