How to add a new column to the beginning of the rows of a CSV file?

It would be fairly easy to do using the csv module’s DictReader and DictWriter classes. Here’s an example that reads the old file and writes the new one in single pass.

A DictReader instance returns each logical line or row of the file as a dictionary whose keys are the field names. You can explicitly specify the field names or they can be read from the first line of the file (as is done in the example below).

You must specify the desired field names when creating a DictWriter instance and the order of the field names defines the order they will appear on each line of the output file. In this case the new field name is simply added to beginning of the list of names from the input file — whatever they may be.

import csv

with open('testdata.txt', 'r', newline="") as inf, \
     open('testdata2.txt', 'w', newline="") as outf:
    csvreader = csv.DictReader(inf)
    fieldnames = ['Node'] + csvreader.fieldnames  # Add column name to beginning.
    csvwriter = csv.DictWriter(outf, fieldnames)
    csvwriter.writeheader()
    for node, row in enumerate(csvreader, start=1):
        csvwriter.writerow(dict(row, Node="node %s" % node))

If this was the contents of the input file:

ID,Test Description,file-name,module,view,path1,path2
id 1,test 1 desc,test1file.txt,test1module,N,test1path1,test1path2
id 2,test 2 desc,test2file.txt,test2module,Y,test2path1,test2path2
id 3,test 3 desc,test3file.txt,test3module,Y,test3path1,test3path2
id 4,test 4 desc,test4file.txt,test4module,N,test4path1,test4path2
id 5,test 5 desc,test5file.txt,test5module,Y,test5path1,test5path2

This would be the contents of the resulting output file after running the script:

Node,ID,Test Description,file-name,module,view,path1,path2
node 1,id 1,test 1 desc,test1file.txt,test1module,N,test1path1,test1path2
node 2,id 2,test 2 desc,test2file.txt,test2module,Y,test2path1,test2path2
node 3,id 3,test 3 desc,test3file.txt,test3module,Y,test3path1,test3path2
node 4,id 4,test 4 desc,test4file.txt,test4module,N,test4path1,test4path2
node 5,id 5,test 5 desc,test5file.txt,test5module,Y,test5path1,test5path2

Note that adding the data for a field to each row with dict(row, Node="node %s" % node) as shown only works when the field name is a valid keyword argument (i.e. valid Python identifier) — like Node.

Valid identifiers consist only of letters, digits, and underscores but not start with a digit or underscore, and cannot be language keyword such as class, for, return, global, pass, etc.

The workaround for this limitation is to update each row dictionary manually since the field name cannot be used as a keyword argument:

    fieldnames = ['Invalid-Identifier''] + csvreader.fieldnames  # Add column name.
    ...
    for node, row in enumerate(csvreader, 1):
        row['Invalid-Identifier'] = 'node %s' % node  # add new field and value
        csvwriter.writerow(row)

More Related Contents:

Leave a Comment Cancel reply