Entity Framework Include OrderBy random generates duplicate data

As one can sort it out by reading AakashM answer and Nicolae Dascalu answer, it strongly seems Linq OrderBy requires a stable ranking function, which NewID/Guid.NewGuid is not.

So we have to use another random generator that would be stable inside a single query.

To achieve this, before each querying, use a .Net Random generator to get a random number. Then combine this random number with a unique property of the entity to get randomly sorted. And to ‘randomize’ a bit the result, checksum it. (checksum is a SQL Server function that compute a hash; original idea founded on this blog.)

Assuming Person Id is an int, you could write your query this way :

// Random instances should be stored and reused, not instanciated at each usage.
// But beware, it is not thread safe. If you want to share it between threads, you
// would have to use locks, see its documentation.
// https://learn.microsoft.com/en-us/dotnet/api/system.random.
// But using locks is a bad idea for scalability, especially in a Web context.
var randomGenerator = new Random();
// ...

var rnd = randomGenerator.NextDouble();
var persons = db.Persons
    .Include(p => p.Addresses)
    .OrderBy(p => SqlFunctions.Checksum(p.Id * rnd));

Like the NewGuid hack, this is very probably not a good random generator with a good distribution and so on. But it does not cause entities to get duplicated in results.

Beware:
If your query ordering does not guarantees uniqueness of your entities ranking, you must complement it for guarantying it. By example, if you use a non-unique property of your entities for the checksum call, then add something like .ThenBy(p => p.Id) after the OrderBy.
If your ranking is not unique for your queried root entity, its included children may get mixed with children of other entities having the same ranking. And then the bug will stay here.

Note:
I would prefer use .Next() method to get an int then combine it through a xor (^) to an entity int unique property, rather than using a double and multiply it. But SqlFunctions.Checksum unfortunately does not provide an overload for int data type, though the SQL server function is supposed to support it. You may use a cast to overcome this, but for keeping it simple I finally had chosen to go with the multiply.

Leave a Comment