Find Duplicates in PowerShell Datatable using LINQ

Update 1/4: Removed conversion of datatable to list. That just wasted memory, as this works on all enumerables.

I don’t have a practical application for this method, but I wanted to figure out how to do it anyway. Doing so was frustrating but kinda fun. It forced me to learn about lamdas in PowerShell which I’ll probably use at some point in the future. Nivot Ink had some straightforward syntax that helped me, along with Matt Graeber’s post on PowerShellMagazine.

Esentially, I was looking to recreate the following SQL statement in LINQ: SELECT ColumnName, COUNT(*) as DupeCount FROM TableName GROUP BY ColumnName HAVING COUNT(*) > 1. The PowerShell LINQ Equivalent is this basically:

Sample Project

In this sample project, we’ll create some fake data, then query it using LINQ. I tested this on larger data sets and found it got exponentially slower. The OleDBConnection method I wrote about earlier still smokes this technique. Then again, when it comes to .NET, I have little idea of what I’m doing, so maybe it can be optimized. I’m trying to figure out a way to query this list with CreateQuery, too.

Create Datatable with Dupes

Prep the dupe check

First we have to decide what columns to examine. In this example, duplicate artist+album constitutes a duplicate.

Populate new datatable with results and clean up

The above code was prep work, now we’re going to actually execute the query, and populate a datatable with the results. Unfortunately, there’s no good way to convert a Linq list to a Datatable so we’ll just use .Clone() to clone build the new table schema, and then perform an ImportRow.

And voila, the results

get-dupes

Want to see the code all in one shot? Check out the gist.

Chrissy is a Cloud and Datacenter Management MVP who has worked in IT for over 20 years. She is the creator of the popular SQL PowerShell module dbatools, and holds a number of certifications, including those relating to SQL Server, Linux, SharePoint and network security. You can follow her on Twitter at @cl.

Posted in PowerShell
4 comments on “Find Duplicates in PowerShell Datatable using LINQ
  1. phil Factor says:

    Wow! I loved that. I didn’t think it was possible to do this in PowerShell

    • Chrissy LeMaire says:

      Senpai notice me :D

      I’m a super huge fan and have used your articles extensively in learning SQL and PowerShell. Thank you! I recently released a PowerShell module that can import 25,000,000 rows from CSV in less than two minutes.

      I think you may enjoy playing. Right now, I’m trying to see if LINQ can be used at the $bulkcopy.WriteToServer().

  2. CuteDev says:

    Hi, thanks for such an awesome post, i know m looking to this post after an year ;) but many of us are not even aware that was possible through powershell

    • Chrissy LeMaire says:

      Thank you! I was just on this post again last night exploring how I’d potentially get Linq to turn a CSV into streaming IDataReader. Haven’t accomplished it yet ;)

Leave a Reply

Your email address will not be published. Required fields are marked *

*