Powershell, foreach, and foreach-object

Wat

An interesting discussion cropped up at work recently regarding looping constructs in Windows powershell. The context was a minor code change (replacing ForEach-Object with Foreach in two lines); the claim was that Foreach was much faster than Foreach-Object, with the change saving many seconds per script invocation.

Not being familiar with the script, and being a relative newcomer to powershell, this sounded odd, but plausible. The obvious question arose in my mind: how could such a minor change in code result in such a large change in run time?

Loop Construct versus Commandlet

I started with the first link in the pull request supporting the change. It goes into some detail on how a developer improved the performance of a script, and it reads in part:
The difference between the ForEach loop construct and ForEach-Object cmdlet is not well understood by those beginning to write Windows PowerShell scripts. This is because they initially appear to be the same. However, they are different in how they handle the loop object enumeration (among other things), and it is worth understanding the difference between the two—especially when you are trying to speed up your scripts.
Digging further, the foreach loop construct looks like this:
foreach ($object in $objects) { commands ... }
And the commandlet version of foreach looks like this:
dir C:\ | ForEach-Object { $_.Name }
Superficially, they look similar, but there are hints in the syntax that make them visually distinct:
  • The commandlet version looks like " | foreach-object { commands } "
    • it has a pipe
    • it has the objects before the "foreach"
  • The loop construct looks like " foreach (obj in objects) { commands } "
    • it does not have a pipe
    • its objects come after the foreach

Aliases

PowerShell muddies the waters by allowing you to define aliases for your commandlets.  The Foreach-Object commandlet comes with two predefined aliases: Foreach and %.  Don't be fooled though--when you see this command:
dir C:\ | Foreach { $_.Name }
You're still using the commandlet version of foreach, not the foreach loop construct. We know this because (1) it has a pipe, and (2) the objects (from the dir command in this case) come before our ForEach.  Even though it looks like the looping construct, we know this is the commandlet alias, which is not the same thing.

What's the difference?

As it turns out, in PowerShell, apart from the syntax surrounding the ForEach variants, people claim that there are real differences underlying the commandlet form and the loop construct form of ForEach:
  • Memory use. The claim is that the loop construct version of ForEach evaluates its loop array fully before starting the loop.  This causes memory consumption not seen in the commandlet version, which handles objects as they arrive.
  • Delayed start of loop. See above.
  • Speed.  According to this blog post, Bruce Payette (PowerShell development lead) says that there are optimizations to the ForEach loop construct that allows it to outperform the commandlet in some circumstances.

Back to the Beginning

With this basic knowledge under our belts, we can revisit the original claim that this code:
dir C:\ | ForEach { $_.Name }
is significantly faster than this code:
dir C:\ | ForEach-Object { $_.Name }
Any difference between the two would be entirely due to how PowerShell invokes aliases of commandlets versus the commandlets themselves.  This also suggests an interesting corollary--if there is a difference between the above versions, the "%" alias for ForEach-Object should outperform ForEach-Object as well!

Looking at the benchmark done here, and performing my own benchmark, I am confident that for this use case, there is little performance difference between the commandlet and the loop construct ForEach, and even less (if any) difference between the commandlet aliases.

Conclusions

The facts:
  • The commandlet and loop construct versions are distinct entities in PowerShell.
  • Many people get the variations in ForEach mixed up.
  • There are substantiated claims of real performance differences between the commandlet and loop construct versions of ForEach.
  • Casual PowerShell users are unlikely to see any important differences between the two; any time spent worrying about the two is likely to be premature optimization.
  • If your teammate is convinced that the ForEach commandlet alias is faster than the ForEach-Object commandlet, that teammate is certainly confused about the various forms ForEach can take in PowerShell.

Resources

  1. https://blogs.technet.microsoft.com/heyscriptingguy/2014/05/18/weekend-scripter-powershell-speed-improvement-techniques/
  2. http://poshoholic.com/2007/08/21/essential-powershell-understanding-foreach/
  3. http://powershelladministrator.com/2015/11/15/speed-of-loops-and-different-ways-of-writing-to-files-which-is-the-quickest/
  4. https://www.pluralsight.com/blog/it-ops/what-you-need-to-know-about-foreach-loops-in-powershell 
  5. http://zduck.com/2013/benchmarking-with-Powershell/ 
  6. https://technet.microsoft.com/en-us/library/hh847816.aspx
  7. https://technet.microsoft.com/en-us/library/hh849731.aspx


Comments