when does Python allocate new memory for identical strings?

Each implementation of the Python language is free to make its own tradeoffs in allocating immutable objects (such as strings) — either making a new one, or finding an existing equal one and using one more reference to it, are just fine from the language’s point of view. In practice, of course, real-world implementation strike reasonable compromise: one more reference to a suitable existing object when locating such an object is cheap and easy, just make a new object if the task of locating a suitable existing one (which may or may not exist) looks like it could potentially take a long time searching.

So, for example, multiple occurrences of the same string literal within a single function will (in all implementations I know of) use the “new reference to same object” strategy, because when building that function’s constants-pool it’s pretty fast and easy to avoid duplicates; but doing so across separate functions could potentially be a very time-consuming task, so real-world implementations either don’t do it at all, or only do it in some heuristically identified subset of cases where one can hope for a reasonable tradeoff of compilation time (slowed down by searching for identical existing constants) vs memory consumption (increased if new copies of constants keep being made).

I don’t know of any implementation of Python (or for that matter other languages with constant strings, such as Java) that takes the trouble of identifying possible duplicates (to reuse a single object via multiple references) when reading data from a file — it just doesn’t seem to be a promising tradeoff (and here you’d be paying runtime, not compile time, so the tradeoff is even less attractive). Of course, if you know (thanks to application level considerations) that such immutable objects are large and quite prone to many duplications, you can implement your own “constants-pool” strategy quite easily (intern can help you do it for strings, but it’s not hard to roll your own for, e.g., tuples with immutable items, huge long integers, and so forth).

Leave a Comment