Why can pandas DataFrames change each other?

This is much deeper than dataframes: you are thinking about Python variables the wrong way. Python variables are pointers, not buckets. That is to say, when you write

>>> y = [1, 2, 3]

You are not putting [1, 2, 3] into a bucket called y; rather you are creating a pointer named y which points to [1, 2, 3].

When you then write

>>> x = y

you are not putting the contents of y into a bucket called x; you are creating a pointer named x which points to the same thing that y points to. Thus:

>>> x[1] = 100
>>> print(y)
[1, 100, 3]

because x and y point to the same object, modifying it via one pointer modifies it for the other pointer as well. If you’d like to point to a copy instead, you need to explicitly create a copy. With lists you can do it like this:

>>> y = [1, 2, 3]
>>> x = y[:]
>>> x[1] = 100
>>> print(y)
[1, 2, 3]

With DataFrames, you can create a copy with the copy() method:

>>> df2 = df1.copy()

Leave a Comment