Panda's DataFrame - renaming multiple identically named columns

Starting with Pandas 0.19.0 pd.read_csv() has improved support for duplicate column names

So we can try to use the internal method:

In [137]: pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
Out[137]: ['blah', 'blah2', 'blah3', 'blah.1', 'blah.2']

Since Pandas 1.3.0:

pd.io.parsers.base_parser.ParserBase({'names':df.columns, 'usecols':None})._maybe_dedup_names(df.columns)

This is the “magic” function:

def _maybe_dedup_names(self, names):
    # see gh-7160 and gh-9424: this helps to provide
    # immediate alleviation of the duplicate names
    # issue and appears to be satisfactory to users,
    # but ultimately, not needing to butcher the names
    # would be nice!
    if self.mangle_dupe_cols:
        names = list(names)  # so we can index
        counts = {}

        for i, col in enumerate(names):
            cur_count = counts.get(col, 0)

            if cur_count > 0:
                names[i] = '%s.%d' % (col, cur_count)

            counts[col] = cur_count + 1

    return names

Panda’s DataFrame – renaming multiple identically named columns

Leave a Comment Cancel reply

More Related Contents:

Leave a Comment Cancel reply