Know when to retry or fail when calling SQL Server from C#?

One single SqlException (may) wraps multiple SQL Server errors. You can iterate through them with Errors property. Each error is SqlError:

foreach (SqlError error in exception.Errors)

Each SqlError has a Class property you can use to roughly determine if you can retry or not (and in case you retry if you have to recreate connection too). From MSDN:

  • Class < 10 is for errors in information you passed then (probably) you can’t retry if first you don’t correct inputs.
  • Class from 11 to 16 are “generated by user” then probably again you can’t do anything if user first doesn’t correct his inputs. Please note that class 16 includes many temporary errors and class 13 is for deadlocks (thanks to EvZ) so you may exclude these classes if you handle them one by one.
  • Class from 17 to 24 are generic hardware/software errors and you may retry. When Class is 20 or higher you have to recreate connection too. 22 and 23 may be serious hardware/software errors, 24 indicates a media error (something user should be warned but you may retry in case it was just a “temporary” error).

You can find a more detailed description of each class here.

In general if you handle errors with their class you won’t need to know exactly each error (using error.Number property or exception.Number which is just a shortcut for first SqlError in that list). This has the drawback that you may retry when it’s not useful (or error can’t be recovered). I’d suggest a two steps approach:

  • Check for known error codes (list error codes with SELECT * FROM master.sys.messages) to see what you want to handle (knowing how). That view contains messages in all supported languages so you may need to filter them by msglangid column (for example 1033 for English).
  • For everything else rely on error class, retrying when Class is 13 or higher than 16 (and reconnecting if 20 or higher).
  • Errors with severity higher than 21 (22, 23 and 24) are serious errors and little waiting won’t fix that problems (database itself may also be damaged).

One word about higher classes. How to handle these errors isn’t simple and it depends on many factors (including risk management for your application). As a simple first step I wouldn’t retry for 22, 23, and 24 when attempting a write operation: if database, file system or media are seriously damaged then writing new data may deteriorate data integrity even more (SQL Server is extremely careful to do not compromise DB for a query even in critical circumstances). A damaged server, it depends on your DB network architecture, might even be hot-swapped (automatically, after a specified amount of time, or when a specified trigger is fired). Always consult and work close to your DBA.

Strategy for retrying depends on error you’re handling: free resources, wait for a pending operation to complete, take an alternative action, etc. In general you should retry only if all errors are “retry-able”:

bool rebuildConnection = true; // First try connection must be open

for (int i=0; i < MaximumNumberOfRetries; ++i) {
    try {
        // (Re)Create connection to SQL Server
        if (rebuildConnection) {
            if (connection != null)
                connection.Dispose();

            // Create connection and open it...
        }

        // Perform your task

        // No exceptions, task has been completed
        break;
    }
    catch (SqlException e) {
        if (e.Errors.Cast<SqlError>().All(x => CanRetry(x))) {
            // What to do? Handle that here, also checking Number property.
            // For Class < 20 you may simply Thread.Sleep(DelayOnError);

            rebuildConnection = e.Errors
                .Cast<SqlError>()
                .Any(x => x.Class >= 20);

            continue; 
        }

        throw;
    }
}

Wrap everything in try/finally to properly dispose connection. With this simple-fake-naive CanRetry() function:

private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 };

private static bool CanRetry(SqlError error) {
    // Use this switch if you want to handle only well-known errors,
    // remove it if you want to always retry. A "blacklist" approach may
    // also work: return false when you're sure you can't recover from one
    // error and rely on Class for anything else.
    switch (error.Number) {
        // Handle well-known error codes, 
    }

    // Handle unknown errors with severity 21 or less. 22 or more
    // indicates a serious error that need to be manually fixed.
    // 24 indicates media errors. They're serious errors (that should
    // be also notified) but we may retry...
    return RetriableClasses.Contains(error.Class); // LINQ...
}

Some pretty tricky ways to find list of non critical errors here.

Usually I embed all this (boilerplate) code in one method (where I can hide all the dirty things done to create/dispose/recreate connection) with this signature:

public static void Try(
    Func<SqlConnection> connectionFactory,
    Action<SqlCommand> performer);

To be used like this:

Try(
    () => new SqlConnection(connectionString),
    cmd => {
             cmd.CommandText = "SELECT * FROM master.sys.messages";
             using (var reader = cmd.ExecuteReader()) {
                 // Do stuff
         }
    });

Please note that skeleton (retry on error) can be used also when you’re not working with SQL Server (actually it can be used for many other operations like I/O and network related stuff so I’d suggest to write a general function and to reuse it extensively).

Leave a Comment