The answer by srgerg is terribly inefficient, as it operates in quadratic time; here is a linear time solution instead, using Python 2.6-compatible syntax:
import csv
with open('masterlist.csv', 'rb') as master:
master_indices = dict((r[1], i) for i, r in enumerate(csv.reader(master)))
with open('hosts.csv', 'rb') as hosts:
with open('results.csv', 'wb') as results:
reader = csv.reader(hosts)
writer = csv.writer(results)
writer.writerow(next(reader, []) + ['RESULTS'])
for row in reader:
index = master_indices.get(row[3])
if index is not None:
message="FOUND in master list (row {})".format(index)
else:
message="NOT FOUND in master list"
writer.writerow(row + [message])
This produces a dictionary, mapping signatures from masterlist.csv
to a line number first. Lookups in a dictionary take constant time, making the second loop over hosts.csv
rows independant from the number of rows in masterlist.csv
. Not to mention code that’s a lot simpler.
For those using Python 3, the above only needs to have the open()
calls adjusted to open in text mode (remove the b
from the file mode), and you want to add new line=""
so the CSV reader can take control of line separators. You may want to state the encoding to use explicitly rather than rely on your system default (use encoding=...
). The master_indices
mapping can be built with a dictionary comprehension ({r[1]: i for i, r in enumerate(csv.reader(master))}
).