Why PowerShell workflow is significantly slower than non-workflow script for XML file analysis

foreach -parallel is by far the slowest parallelization option you have with PowerShell, since Workflows are not designed for speed, but for long-running operations that can be safely interrupted and resumed.

The implementation of these safety mechanisms introduces some overhead, which is why your script is slower when run as a workflow.

If you want to optimize for execution speed, use runspaces instead:

$TestDir = "E:\Powershell\Test"
$TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

# Set up runspace pool
$RunspacePool = [runspacefactory]::CreateRunspacePool(1,10)
$RunspacePool.Open()

# Assign new jobs/runspaces to a variable
$Runspaces = foreach ($TestXML in $TestXMLs)
{
    # Create new PowerShell instance to hold the code to execute, add arguments
    $PSInstance = [powershell]::Create().AddScript({
        param($XMLPath)

        [xml]$XML = Get-Content $XMLPath
        (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid
    }).AddParameter('XMLPath', $TestXML.FullName)

    # Assing PowerShell instance to RunspacePool
    $PSInstance.RunspacePool = $RunspacePool

    # Start executing asynchronously, keep instance + IAsyncResult objects
    New-Object psobject -Property @{
        Instance = $PSInstance
        IAResult = $PSInstance.BeginInvoke()
        Argument = $TestXML
    }
}

# Wait for the the runspace jobs to complete
while($Runspaces |Where-Object{-not $_.IAResult.IsCompleted})
{
    Start-Sleep -Milliseconds 500
}

# Collect the results
$Results = $Runspaces |ForEach-Object {
    $Output = $_.Instance.EndInvoke($_.IAResult)
    New-Object psobject -Property @{
        File = $TestXML
        ServerID = $Output
    }
}

Fast XML processing bonus tips:

As wOxxOm suggests, using Xml.Load() is way faster than using Get-Content to read in the XML document.

Furthermore, using dot notation ($xml.root.servers.server) and the Where({}) extension method is also going to be painfully slow if there are many servers or server nodes. Use the SelectNodes() method with an XPath expression to search for “Server1” instead (be aware that XPath is case-sensitive):

$PSInstance = [powershell]::Create().AddScript({
    param($XMLPath)

    $XML = New-Object Xml
    $XML.Load($XMLPath)
    $Server1Node = $XML.SelectNodes('/root/servers/server[@name = "Server1"]')
    return $Server1Node.serverid
}).AddParameter('XMLPath', $TestXML.FullName)

Leave a Comment