Pandas: Chained assignments [duplicate]

The point of the SettingWithCopy is to warn the user that you may be doing something that will not update the original data frame as one might expect.

Here, data is a dataframe, possibly of a single dtype (or not). You are then taking a reference to this data['amount'] which is a Series, and updating it. This probably works in your case because you are returning the same dtype of data as existed.

However it could create a copy which updates a copy of data['amount'] which you would not see; Then you would be wondering why it is not updating.

Pandas returns a copy of an object in almost all method calls. The inplace operations are a convience operation which work, but in general are not clear that data is being modified and could potentially work on copies.

Much more clear to do this:

data['amount'] = data["amount"].fillna(data.groupby("num")["amount"].transform("mean"))

data["amount"] = data['amount'].fillna(mean_avg)

One further plus to working on copies. You can chain operations, this is not possible with inplace ones.

e.g.

data['amount'] = data['amount'].fillna(mean_avg)*2

And just an FYI. inplace operations are neither faster nor more memory efficient. my2c they should be banned. But too late on that API.

You can of course turn this off:

pd.set_option('chained_assignment',None)

Pandas runs with the entire test suite with this set to raise (so we know if chaining is happening) on, FYI.

Leave a Comment