Replicating rows in a pandas data frame by a column value [duplicate]

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

After which there’s only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

which uses the positions, and not the index labels.

More Related Contents:

Leave a Comment Cancel reply