Sorting list of string with specific locale in python

You could use a PyICU‘s collator to avoid changing global settings:

import icu # PyICU

def sorted_strings(strings, locale=None):
    if locale is None:
       return sorted(strings)
    collator = icu.Collator.createInstance(icu.Locale(locale))
    return sorted(strings, key=collator.getSortKey)

Example:

>>> L = [u'sandwiches', u'angel delight', u'custard', u'éclairs', u'glühwein']
>>> sorted_strings(L)
['angel delight', 'custard', 'glühwein', 'sandwiches', 'éclairs']
>>> sorted_strings(L, 'en_US')
['angel delight', 'custard', 'éclairs', 'glühwein', 'sandwiches']

Disadvantage: dependency on PyICU library; the behavior is slightly different from locale.strcoll.


I don’t know how to get locale.strxfrm function given a locale name without changing it globally. As a hack you could run your function in a different child process:

pool = multiprocessing.Pool()
# ...
pool.apply(locale_aware_sort, [strings, loc])

Disadvantage: might be slow, resource hungry


Using ordinary threading.Lock won’t work unless you can control every place where locale aware functions (they are not limited to locale module e.g., re) could be called from multiple threads.


You could compile your function using Cython to synchronize access using GIL. GIL will make sure that no other Python code can be executed while your function is running.

Disadvantage: not pure Python

Leave a Comment