The other day I was given a hard drive and asked to analyze what was on it. While it was easy to determine it had over 300,000 files and about 10 Gigs I didn’t want to navigate through the whole thing looking in the hundreds of folders. I figured a good thing to know was the number of files and size by extension.
Powershell to the rescue. After several iterations of various techniques I came up with a CmdLet allowing me to measure any kind of object. You can specify which property to use as the Group and for measurement. I’m not sure what else I’ll use it for, but I love making things generic.
function MeasureGroup-Object
( [string]$group=$(Throw "Group Name is Required")
, [string]$property=$(Throw "Property Name is Required")
, $items
)
{
begin
{
function processItem($item)
{
$key = $item.$group
if ($Aggregate.$key -eq $null)
{
$Aggregate.$key = @{Count=0;Sum=0}
}
$Aggregate.$key.Count += 1
$Aggregate.$key.Sum += $item.$property
}
# Hash table to collect stats
$Aggregate = @{}
if ($items -ne $null)
{
foreach ($item in $items)
{
processItem $item
}
}
}
process
{
if ($_ -ne $null)
{
processItem $_
}
}
end
{
function AddProperty ($object,$name,$value)
{
$member = new-object management.automation.PSNoteProperty $name,$value
$object.psobject.members.Add($member)
}
foreach ($key in $Aggregate.Keys)
{
$obj = new-object management.automation.psobject
AddProperty $obj $group $key
AddProperty $obj Count $Aggregate.$key.Count
AddProperty $obj Sum $Aggregate.$key.Sum
$obj
}
}
}
So let’s put it to work. First you have to collect the objects you want to measure.
$files = Get-ChildItem -Recurse | where {$_ -is [System.IO.FileInfo]}
This will recursively collect all the files under the current directory. It helps to filter out the Directories. Because they don’t have Extensions, they would end up inflating the stats for files without extensions.
Now to use the MeasureGroup-Object.
$stats = MeasureGroup-Object Extension Length $files
The $stats variable will contain an array with an entry for each extension type. Each entry will have an Extension, Count and Sum value.
The following will display the results but not in any particular order.
$stats
This will be more interesting.
# Get Top 5 file types by count
$stats | Sort-Object Count -Descending | Select-Object -First 5
Extension Count Sum
--------- ----- ---
.cs 842 3018481
.hxs 765 980371275
.cab 756 1801724080
.sql 538 6734980
.dll 367 79635976
These are also interesting views.
# Get Top 5 file types by total Length
$stats | Sort-Object Sum -Descending | Select-Object -First 5
# Display all Extensions
$stats | Sort-Object Extension
Some comments about the implementation.
The End processing loops through the $Aggregate results and builds PSObjects with Note values. The PSObjects play nice with the Sort, Select and default display.
Related Post: PowerShell Directory Size