which is faster for load: pickle or hdf5 in python

UPDATE: nowadays I would choose between Parquet, Feather (Apache Arrow), HDF5 and Pickle. Pro’s and Contra’s: Parquet pros one of the fastest and widely supported binary storage formats supports very fast compression methods (for example Snappy codec) de-facto standard storage format for Data Lakes / BigData contras the whole dataset must be read into memory. … Read more

HDF5 – concurrency, compression & I/O performance [closed]

Updated to use pandas 0.13.1 1) No. http://pandas.pydata.org/pandas-docs/dev/io.html#notes-caveats. There are various ways to do this, e.g. have your different threads/processes write out the computation results, then have a single process combine. 2) depending the type of data you store, how you do it, and how you want to retrieve, HDF5 can offer vastly better performance. … Read more

How to read HDF5 files in Python

Read HDF5 import h5py filename = “file.hdf5” with h5py.File(filename, “r”) as f: # Print all root level object names (aka keys) # these can be group or dataset names print(“Keys: %s” % f.keys()) # get first object name/key; may or may NOT be a group a_group_key = list(f.keys())[0] # get the object type for a_group_key: … Read more