Why does the `is` operator behave differently in a script vs the REPL?

When you run code in a .py script, the entire file is compiled into a code object before execution. In this case, CPython is able to make certain optimizations – like reusing the same instance for the integer 300.

You could also reproduce that in the REPL, by executing code in a context more closely resembling the execution of a script:

>>> source = """\ 
... a = 300 
... b = 300 
... print (a==b) 
... print (a is b)## print True 
... print ("id(a) = %d, id(b) = %d"%(id(a), id(b))) ## They have same address 
... """
>>> code_obj = compile(source, filename="myscript.py", mode="exec")
>>> exec(code_obj) 
True
True
id(a) = 140736953597776, id(b) = 140736953597776

Some of these optimizations are pretty aggressive. You could modify the script line b = 300 changing it to b = 150 + 150, and CPython would still “fold” b into the same constant. If you’re interested in such implementation details, look in peephole.c and Ctrl+F for PyCode_Optimize and any info about the “consts table”.

In contrast, when you run code line-by-line directly in the REPL it executes in a different context. Each line is compiled in “single” mode and this optimization is not available.

>>> scope = {} 
>>> lines = source.splitlines()
>>> for line in lines: 
...     code_obj = compile(line, filename="<I'm in the REPL>", mode="single")
...     exec(code_obj, scope) 
...
True
False
id(a) = 140737087176016, id(b) = 140737087176080
>>> scope['a'], scope['b']
(300, 300)
>>> id(scope['a']), id(scope['b'])
(140737087176016, 140737087176080)

Leave a Comment