Key-ordered dict in Python

“Random access O(1)” is an extremely exacting requirement which basically imposes an underlying hash table — and I hope you do mean random READS only, because I think it can be mathematically proven than it’s impossible in the general case to have O(1) writes as well as O(N) ordered iteration.

I don’t think you will find a pre-packaged container suited to your needs because they are so extreme — O(log N) access would of course make all the difference in the world. To get the big-O behavior you want for reads and iterations you’ll need to glue two data structures, essentially a dict and a heap (or sorted list or tree), and keep them in sync. Although you don’t specify, I think you’ll only get amortized behavior of the kind you want – unless you’re truly willing to pay any performance hits for inserts and deletes, which is the literal implication of the specs you express but does seem a pretty unlikely real-life requirement.

For O(1) read and amortized O(N) ordered iteration, just keep a list of all keys on the side of a dict. E.g.:

class Crazy(object):
  def __init__(self):
    self.d = {}
    self.L = []
    self.sorted = True
  def __getitem__(self, k):
    return self.d[k]
  def __setitem__(self, k, v):
    if k not in self.d:
      self.L.append(k)
      self.sorted = False
    self.d[k] = v
  def __delitem__(self, k):
    del self.d[k]
    self.L.remove(k)
  def __iter__(self):
    if not self.sorted:
      self.L.sort()
      self.sorted = True
    return iter(self.L)

If you don’t like the “amortized O(N) order” you can remove self.sorted and just repeat self.L.sort() in __setitem__ itself. That makes writes O(N log N), of course (while I still had writes at O(1)). Either approach is viable and it’s hard to think of one as intrinsically superior to the other. If you tend to do a bunch of writes then a bunch of iterations then the approach in the code above is best; if it’s typically one write, one iteration, another write, another iteration, then it’s just about a wash.

BTW, this takes shameless advantage of the unusual (and wonderful;-) performance characteristics of Python’s sort (aka “timsort”): among them, sorting a list that’s mostly sorted but with a few extra items tacked on at the end is basically O(N) (if the tacked on items are few enough compared to the sorted prefix part). I hear Java’s gaining this sort soon, as Josh Block was so impressed by a tech talk on Python’s sort that he started coding it for the JVM on his laptop then and there. Most sytems (including I believe Jython as of today and IronPython too) basically have sorting as an O(N log N) operation, not taking advantage of “mostly ordered” inputs; “natural mergesort”, which Tim Peters fashioned into Python’s timsort of today, is a wonder in this respect.

More Related Contents:

Leave a Comment Cancel reply