Client side GroupBy is not supported

It seems like there is a common misconception about what LINQ GroupBy does and what SQL GROUP BY is able to do. Since I fell into the exact same trap and had to wrap my head around this recently, I decided to write a more thorough explanation of this issue.

Short answer:

The LINQ GroupBy is much different from the SQL GROUP BY statement: LINQ just divides the underlying collection into chunks depending on a key, while SQL additionally applies an aggregation function to condense each of these chunks down into a single value.

This is why EF has to perform your LINQ-kind GroupBy in memory.

Before EF Core 3.0, this was done implicitly, so EF downloaded all result rows and then applied the LINQ GroupBy. However, this implicit behavior might let the programmer expect that the entire LINQ query is executed in SQL, with potentially enormous performance impact when the result set is rather large. For this reason, implicit client side evaluation of GroupBy was disabled completely in EF Core 3.0.

Now it is required to explicitly call functions like .AsEnumerable() or .ToList(), which download the result set and continue with in-memory LINQ operations.

Long answer:

The following table solvedExercises will be the running example for this answer:

+-----------+------------+
| StudentId | ExerciseId |
+-----------+------------+
|         1 |          1 |
|         1 |          2 |
|         2 |          2 |
|         3 |          1 |
|         3 |          2 |
|         3 |          3 |
+-----------+------------+

A record X | Y in this table denotes that student X has solved exercise Y.

In the question, a common use case of LINQ’s GroupBy method is described: Take a collection and group it into chunks, where the rows in each chunk share a common key.

In our example, we might want to get a Dictionary<int, List<int>>, which contains a list of solved exercises for each student. With LINQ, this is very straightforward:

var result = solvedExercises
    .GroupBy(e => e.StudentId)
    .ToDictionary(e => e.Key, e => e.Select(e2 => e2.ExerciseId).ToList());

Output (for full code see dotnetfiddle):

Student #1: 1 2 
Student #2: 2 
Student #3: 1 2 3

This is easy to represent with C# datatypes, since we can nest List and Dictionary as deep as we like to.

Now we try to imagine this as an SQL query result. SQL query results are usually represented as a table, where we can freely choose the returned columns. To represent our above query as SQL query result, we would need to

generate multiple result tables,
put the grouped rows into an array or
somehow insert a “result set separator”.

As far as I know, none of these approaches is implemented in practice. At most, there are some hacky work-arounds like MySQL’s GROUP_CONCAT, which allows to combine the result rows into a string (relevant SO answer).

Thus we see, that SQL cannot yield results that match LINQ’s notion of GroupBy.

Instead, SQL only allows so-called aggregation: If we, for example, wanted to count how many exercises have been passed by a student, we would write

SELECT StudentId,COUNT(ExerciseId)
FROM solvedExercises
GROUP BY StudentId

…which will yield

+-----------+-------------------+
| StudentId | COUNT(ExerciseId) |
+-----------+-------------------+
|         1 |                 2 |
|         2 |                 1 |
|         3 |                 3 |
+-----------+-------------------+

Aggregation functions reduce a set of rows into a single value, usually a scalar. Examples are row count, sum, maximum value, minimum value, and average.

This is implemented by EF Core: Executing

var result = solvedExercises
    .GroupBy(e => e.StudentId)
    .Select(e => new { e.Key, Count = e.Count() })
    .ToDictionary(e => e.Key, e => e.Count);

generates the above SQL. Note the Select, which tells EF which aggregation function it should use for the generated SQL query.

In summary, the LINQ GroupBy function is much more general than the SQL GROUP BY statement, which due to SQL’s restrictions only allows to return a single, two dimensional result table. Thus, queries like the one in the question and the first example in this answer have to be evaluated in memory, after downloading the SQL result set.

Instead of implicitly doing this, in EF Core 3.0 the developers chose to throw an exception in this case; this prevents accidental downloading of an entire, potentially large table with millions of rows, which might get unnoticed during development due to a small test database.

More Related Contents:

Leave a Comment Cancel reply