In pandas, is inplace = True considered harmful, or not?
Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace
argument be deprecated api-wide sometime in the near future. In a nutshell, here’s everything wrong with the inplace
argument:
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
can lead to the dreadedSettingWithCopyWarning
when called on a DataFrame column, and may sometimes fail to update the column in-place
The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.
We take a look at the points above in more depth.
Performance
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In general, there are no performance benefits to using inplace=True
(but there are rare exceptions which are mostly a result of implementation detail in the library and should not be used as a crutch to advocate for this argument’s usage). Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.
Method Chaining
inplace=True
also hinders method chaining. Contrast the working of
result = df.some_function1().reset_index().some_function2()
As opposed to
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
Unintended Pitfalls
One final caveat to keep in mind is that calling inplace=True
can trigger the SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Which can cause unexpected behavior.