nvarchar concatenation / index / nvarchar(max) inexplicable behavior

TLDR; This is not a documented/supported approach for concatenating strings across rows. It sometimes works but also sometimes fails as it depends what execution plan you get.

Instead use one of the following guaranteed approaches

SQL Server 2017+

SELECT @a = STRING_AGG([msg], '') WITHIN GROUP (ORDER BY [priority] ASC)
FROM bla
where   autofix = 0

SQL Server 2005+

SELECT @a = (SELECT [msg] + ''
             FROM   bla
             WHERE  autofix = 0
             ORDER  BY [priority] ASC
             FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)') 

Background

The KB article already linked by VanDerNorth does include the line

The correct behavior for an aggregate concatenation query is
undefined.

but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.

In order to achieve the expected results from an aggregate
concatenation query, apply any Transact-SQL function or expression to
the columns in the SELECT list rather than in the ORDER BY clause.

Your problematic query does not apply any expressions to columns in the ORDER BY clause.

The 2005 article Ordering guarantees in SQL Server… does state

For backwards compatibility reasons, SQL Server provides support for
assignments of type SELECT @p = @p + 1 … ORDER BY at the top-most
scope.

In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([@x]+[Expr1004]) appears above the sort.

In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression @x = @x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of @x. In another similar Connect Item from 2006 the response from Microsoft spoke of “fixing” the issue.

The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed

Example 1

we do not make any guarantees on the correctness of concatenation
queries (like using variable assignments with data retrieval in a
specific order). The query output can change in SQL Server 2008
depending on the plan choice, data in the tables etc. You shouldn’t
rely on this working consistently even though the syntax allows you to
write a SELECT statement that mixes ordered rows retrieval with
variable assignment.

Example 2

The behavior you are seeing is by design. Using assignment operations
(concatenation in this example) in queries with ORDER BY clause has
undefined behavior. This can change from release to release or even
within a particular server version due to changes in the query plan.
You cannot rely on this behavior even if there are workarounds. See
the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed
mechanism are the following:

  1. Use cursor to loop through the rows in specific order and concatenate the values
  2. Use for xml query with ORDER BY to generate the concatenated values
  3. Use CLR aggregate (this will not work with ORDER BY clause)

Example 3

The behavior you are seeing is actually by design. This has to do with
SQL being a set-manipulation language. All expressions in the SELECT
list (and this includes assignments too) are not guaranteed to be
executed exactly once for each output row. In fact, SQL query
optimizer tries hard to execute them as few times as possible. This
will give expected results when you are computing the value of the
variable based on some data in the tables, but when the value that you
are assigning depends on the previous value of the same variable, the
results may be quite unexpected. If the query optimizer moves the
expression to a different place in the query tree, it may get
evaluated less times (or just once, as in one of your examples). This
is why we don’t recommend using the “iteration” type assignments to
compute aggregate values. We find that XML-based workarounds … usually work well for the
customers

Example 4

Even without ORDER BY, we do not guarantee that @var = @var +
will produce the concatenated value for any statement
that affects multiple rows. The right-hand side of the expression can
be evaluated either once or multiple times during query execution and
the behavior as I said is plan dependent.

Example 5

The variable assignment with SELECT statement is a proprietary syntax
(T-SQL only) where the behavior is undefined or plan dependent if
multiple rows are produced. If you need to do the string concatenation
then use a SQLCLR aggregate or FOR XML query based concatenation or
other relational methods.

Leave a Comment