As @chrisb said, pandas’ read_csv
is probably faster than csv.reader/numpy.genfromtxt/loadtxt
. I don’t think you will find something better to parse the csv (as a note, read_csv
is not a ‘pure python’ solution, as the CSV parser is implemented in C).
But, if you have to load/query the data often, a solution would be to parse the CSV only once and then store it in another format, eg HDF5. You can use pandas
(with PyTables
in background) to query that efficiently (docs).
See here for a comparison of the io performance of HDF5, csv and SQL with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#performance-considerations
And a possibly relevant other question: “Large data” work flows using pandas