LINQ performance FAQ

Linq, as a built-in technology, has performance advantages and disadvantages. The code behind the extension methods has had considerable performance attention paid to it by the .NET team, and its ability to provide lazy evaluation means that the cost of performing most manipulations on a set of objects is spread across the larger algorithm requiring the manipulated set. However, there are some things you need to know that can make or break your code’s performance.

First and foremost, Linq doesn’t magically save your program the time or memory needed to perform an operation; it just may delay those operations until absolutely needed. OrderBy() performs a QuickSort, which will take nlogn time just the same as if you’d written your own QuickSorter or used List.Sort() at the right time. So, always be mindful of what you’re asking Linq to do to a series when writing queries; if a manipulation is not necessary, look to restructure the query or method chain to avoid it.

By the same token, certain operations (sorting, grouping, aggregates) require knowledge of the entire set they are acting upon. The very last element in a series could be the first one the operation must return from its iterator. On top of that, because Linq operations should not alter their source enumerable, but many of the algorithms they use will (i.e. in-place sorts), these operations end up not only evaluating, but copying the entire enumerable into a concrete, finite structure, performing the operation, and yielding through it. So, when you use OrderBy() in a statement, and you ask for an element from the end result, EVERYTHING that the IEnumerable given to it can produce is evaluated, stored in memory as an array, sorted, then returned one element at a time. The moral is, any operation that needs a finite set instead of an enumerable should be placed as late in the query as possible, allowing for other operations like Where() and Select() to reduce the cardinality and memory footprint of the source set.

Lastly, Linq methods drastically increase the call stack size and memory footprint of your system. Each operation that must know of the entire set keeps the entire source set in memory until the last element has been iterated, and the evaluation of each element will involve a call stack at least twice as deep as the number of methods in your chain or clauses in your inline statement (a call to each iterator’s MoveNext() or yielding GetEnumerator, plus at least one call to each lambda along the way). This is simply going to result in a larger, slower algorithm than an intelligently-engineered inline algorithm that performs the same manipulations. Linq’s main advantage is code simplicity. Creating, then sorting, a dictionary of lists of groups values is not very easy-to-understand code (trust me). Micro-optimizations can obfuscate it further. If performance is your primary concern, then don’t use Linq; it will add approximately 10% time overhead and several times the memory overhead of manipulating a list in-place yourself. However, maintainability is usually the primary concern of developers, and Linq DEFINITELY helps there.

On the performance kick: If performance of your algorithm is the sacred, uncompromisable first priority, you’d be programming in an unmanaged language like C++; .NET is going to be much slower just by virtue of it being a managed runtime environment, with JIT native compilation, managed memory and extra system threads. I would adopt a philosophy of it being “good enough”; Linq may introduce slowdowns by its nature, but if you can’t tell the difference, and your client can’t tell the difference, then for all practical purposes there is no difference. “Premature optimization is the root of all evil”; Make it work, THEN look for opportunities to make it more performant, until you and your client agree it’s good enough. It could always be “better”, but unless you want to be hand-packing machine code, you’ll find a point short of that at which you can declare victory and move on.

Leave a Comment