As of pandas 2.0, append
(previously deprecated) was removed.
You need to use concat
instead (for most applications):
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
As noted by @cottontail, it’s also possible to use loc
, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex
:
df.loc[len(df)] = new_row # only use with a RangeIndex!
Why was it removed?
We frequently see new users of pandas try to code like they would do it in pure python. They use iterrows
to access items in a loop (see here why you shouldn’t), or append
in a way that is similar to python list.append
.
However, as noted in pandas’ issue #35407, pandas’s append
and list.append
are really not the same thing. list.append
is in place, while pandas’s append
creates a new DataFrame:
I think that we should deprecate Series.append and DataFrame.append.
They’re making an analogy to list.append, but it’s a poor analogy
since the behavior isn’t (and can’t be) in place. The data for the
index and values needs to be copied to create the result.These are also apparently popular methods. DataFrame.append is around
the 10th most visited page in our API docs.Unless I’m mistaken, users are always better off building up a list of
values and passing them to the constructor, or building up a list of
NDFrames followed by a single concat.
As a consequence, while list.append
is amortized O(1) at each step of the loop, pandas’ append
is O(n)
, making it inefficient when repeated insertion is performed.
What if I need to repeat the process?
Using append
or concat
repeatedly is not a good idea (this has a quadratic behavior as it creates a new DataFrame for each step).
In such case, the new items should be collected in a list, and at the end of the loop converted to DataFrame
and eventually concatenated to the original DataFrame
.
lst = []
for new_row in items_generation_logic:
lst.append(new_row)
# create extension
df_extended = pd.DataFrame(lst, columns=['A', 'B', 'C'])
# or columns=df.columns if identical columns
# concatenate to original
out = pd.concat([df, df_extended])