Using .NET DataTable Computes and Column Expressions in PowerShell

As a SQL Server DBA, I'm used to SQL being able to perform aliasing, super fast subselects, and easy aggregation. My most recent project, Invoke-Locate.ps1, however, uses SQLite because of its portability and speed when performing simple selects and inserts. The downside is that SQLite doesn't handle some subselects very well, and I was left with manipulating data on the client-side.

Once I ported locate, I wanted to use the data locate gathered to replicate the results of another favorite tool of mine in Linux, du, which displays current disk usage information. This is helpful when you want to find out what's using all of your disk space. Specifically, I wanted to replicate the output seen when performing du -all -h.

suse-du2

For the PowerShell translation, I ultimately used $datacolumn.Expression and $datatable.Compute to help aggregate information about disk space usage when the SQL-equivalent subselects proved too resource intensive for SQLite. There was a pure SQLite solution using temp tables, the code was ugly, and tasks like this seem exactly what Expressions and Computes were made for.

psdu

You may notice that du says 69MB, while my output shows 68MB. This is because of slightly different rounding. Windows Explorer shows the size of the folder as 68.3MB.

Below I outline what I've learned about DataTable Computes and Column Expressions and how I used them.

What is DataTable Compute?

DataTable Computes aggregate data within an entire table in the format $datatable.Compute($expression, $filter). Think of the filter as a WHERE clause, and the expression like a SELECT Agg(). Microsoft provides the following example (translated to PowerShell). In this example, the computation would show the SUM of all Orders where the EmpID column equals 5.

$dataset.Tables["Orders"].Compute("Sum(Total)", "EmpID = 5")

I had multiple columns to compute, so I added new columns to my data table, and populated each new column with the computed information.

 1[void]$datatable.Columns.Add("totalkb",[int64])
 2[void]$datatable.Columns.Add("totalmb",[int64])
 3[void]$datatable.Columns.Add("totalgb",[int64])
 4
 5foreach ($row in $datatable.rows) {
 6     try {
 7          $directory = $row.name 
 8          $where = "name like '$directory\*' or name = '$directory'"
 9          $row["totalkb"] = ($datatable.Compute("sum(kb)",$where)) 
10          $row["totalmb"] = ($datatable.Compute("sum(mb)",$where))
11          $row["totalgb"] = ($datatable.Compute("sum(gb)",$where))
12     } catch { Write-Warning "Could not parse $filename info." }
13}

This code basically translates to "For every directory, give me the sum of kilobytes, megabytes and gigabytes of all of its subdirectories, and add this data to the row's totalkb, totalmb, and totalgb columns."

What is a DataColumn Expression?

Column expressions are intended to filter rows, calculate the values in a column, or create aggregate columns. They provide similar functionality akin to WHERE, SUM() or CASE in a SQL statements. Microsoft's data column expression page is actually pretty thorough and provides the following straightforward examples:

1$dataset.Tables("Orders").Columns("OrderCount").Expression = "Count(OrderID)"
2$dataset.Tables("Products").Columns("tax").Expression = "UnitPrice * 0.086"

Here's another simple example: say you have a column in your datatable that contained file sizes in bytes. Column Expressions can be used to display the file size as kilobytes, megabytes and gigabytes.

1$kb = $datatable.Columns.Add("kb")
2$kb.Expression = "bytes * 1024"
3
4$mb = $datatable.Columns.Add("mb")
5$mb.Expression = "kb * 1024"
6
7$gb = $datatable.Columns.Add("gb")
8$gb.Expression = "mb * 1024"

Sample Project

In this sample project, we’ll populate a data table with file information, then use Compute and Expression to display the results in our desired format.

Create Basic DataTable and Populate with Data

Here, we'll create a DataTable that has just three columns: name, directory and bytes.

 1# Create Basic Datatable
 2$datatable = New-Object System.Data.Datatable
 3[void]$datatable.Columns.Add("name")
 4[void]$datatable.Columns.Add("directory")
 5[void]$datatable.Columns.Add("bytes", [double])
 6
 7# Add data
 8[void]$datatable.Rows.Add("locate","C:",1.0)
 9[void]$datatable.Rows.Add("Backup","C:\locate",1.0)
10[void]$datatable.Rows.Add("Invoke-Locate.ps1","C:\locate",39630.0)
11[void]$datatable.Rows.Add("System.Data.SQLite.dll","C:\locate",1122304.0)
12[void]$datatable.Rows.Add("install.png","C:\locate\Backup",22151.0)
13[void]$datatable.Rows.Add("Installandsearch.png","C:\locate\Backup", 34596.0)
14[void]$datatable.Rows.Add("invoke-locate-directory.png","C:\locate\Backup",47251.0)
15[void]$datatable.Rows.Add("Invoke-Locate-install-search.png","C:\locate\Backup",34596.0)
16[void]$datatable.Rows.Add("Invoke-Locate.ps1","C:\locate\Backup",9797.0)
17[void]$datatable.Rows.Add("locate.sqlite","C:\locate\Backup",0278144.0)
18[void]$datatable.Rows.Add("locate.xml","C:\locate\Backup",3534.0)
19[void]$datatable.Rows.Add("measure.png","C:\locate\Backup",10603.0)
20[void]$datatable.Rows.Add("sctask-output.txt","C:\locate\Backup",57.0)
21[void]$datatable.Rows.Add("search-verbose.png","C:\locate\Backup",34987.0)
22[void]$datatable.Rows.Add("search.png","C:\locate\Backup",31919.0)
23[void]$datatable.Rows.Add("System.Data.SQLite.dll","C:\locate\Backup",1122304.0)
24[void]$datatable.Rows.Add("Update-LocateDB.ps1","C:\locate\Backup",73.0)

du4

Now Transform The Data

Here are simple expressions that concatenate strings, and change bytes into kilobytes, etc. In my own PowerShell project, Invoke-Locate.ps1 I actually used a SQLite view to accomplish similar results.

Create columns which display the full path using directory and filename.
1$fullname = $datatable.Columns.Add("fullname")
2$fullname.Expression = ("directory+'\'+name")
Create columns to display bytes as kilobytes, megabytes, and gigabytes.
1$kb = $datatable.Columns.Add("kb",[double])
2$kb.Expression = "bytes / 1024"
3
4$mb = $datatable.Columns.Add("mb",[double])
5$mb.Expression = "kb / 1024"
6
7$gb = $datatable.Columns.Add("gb",[double])
8$gb.Expression = "mb / 1024"

Here's what the datatable looks like now:

du5

Create new columns that will contain the sizes of each directory
1[void]$datatable.Columns.Add("totalkb",[int64])
2[void]$datatable.Columns.Add("totalmb",[int64])
3[void]$datatable.Columns.Add("totalgb",[int64])
Populate new totals columns using DataTable Computes

In the example below, I'm performing the equivalent of a sub select in SQL. With DataTables, however, each column must be populated line by line because Computes are run against an entire table, so you can't just have one summarizing column.

1foreach ($row in $datatable.rows) {
2    try {
3        $fullname = $row.fullname 
4        $where = "fullname like '$fullname\*' or fullname = '$fullname'"
5        $row["totalkb"] = ($datatable.Compute("sum(kb)",$where)) 
6        $row["totalmb"] = ($datatable.Compute("sum(mb)",$where))
7        $row["totalgb"] = ($datatable.Compute("sum(gb)",$where))
8    } catch { Write-Warning "Could not parse $fullname info." }
9}

Now the datatable has useful information, but it's not quite clean enough. Nor does it look like the Linux du command, which is what I was after.

du3

Use the totals data to make a single column which displays human readable file sizes

Now to use a slightly more advanced Column Expression: the IIF. Here, I created a new column, then used the IIF statement, which "gets one of two values depending on the result of a logical expression." The code displays the simplified size of a file. If the size is less than 1 MB (< 1025kb), then show the result in KB. Otherwise, if the size is less than 1GB, show it in MB, otherwise, show it in GB.

1$totalsize = $datatable.Columns.Add("totalsize")
2$totalsize.Expression = "IIF(totalkb<1025, totalkb + 'K', IIF(totalmb<1025, totalmb + 'M', totalgb + 'G'))"
3# Select, and display
4$datatable | Select totalsize, fullname | Sort-Object fullname | Format-Table -Auto -HideTableHeaders

The Output

du2

If you recall, we started this table with only 3 columns, and through the use of Expressions and Computes, used those three columns to produce more useful output. Too see all the code in one shot, check out the gist.