Iterating through a CSV and pulling out the last line of the day for each element

You can also consider using groupby

If your data is already in this format:

[
['CZ', '12/27/07 3:55 PM', '1198788900', '42345', '42346',],
['CZ', '12/27/07 5:30 PM', '1198794600', '42346', '42300',],
['CZ', '12/27/07 7:05 PM','1198800300', '42300', '42000',],
['JB', '12/27/07 7:05 PM','1198800300', '13722', '13500',],
['I', '12/27/07 7:05 PM', '1198800300', '4475', '4572']
]

Then you can do this:

#truncate out the time portion of the second col:
for row in data:
    row[1] = row[1].split(" ")[0]

#sort by symbol and date    
data = sorted(data, key = lambda x: (x[0], int(x[2]))) 
from itertools import groupby

for k, g in groupby(data, lambda x:x[:2]):
    before,after = list(g)[-1][-2:] #extracts the last line.
    k.append( "1" if int(after) > int(before) else "-1" )
    print ",".join(k)

With the following output:

CZ,12/27/07,-1
I,12/27/07,1
JB,12/27/07,-1

Leave a Comment