Python: how to store a numpy multidimensional array in PyTables?

There may be a simpler way, but this is how you’d go about doing it, as far as I know: import numpy as np import tables # Generate some data x = np.random.random((100,100,100)) # Store “x” in a chunked array… f = tables.open_file(‘test.hdf’, ‘w’) atom = tables.Atom.from_dtype(x.dtype) ds = f.createCArray(f.root, ‘somename’, atom, x.shape) ds[:] = … Read more

Is there an analysis speed or memory usage advantage to using HDF5 for large array storage (instead of flat binary files)?

HDF5 Advantages: Organization, flexibility, interoperability Some of the main advantages of HDF5 are its hierarchical structure (similar to folders/files), optional arbitrary metadata stored with each item, and its flexibility (e.g. compression). This organizational structure and metadata storage may sound trivial, but it’s very useful in practice. Another advantage of HDF is that the datasets can … Read more

Pandas “Group By” Query on Large Data in HDFStore?

Heres a complete example. import numpy as np import pandas as pd import os fname=”groupby.h5″ # create a frame df = pd.DataFrame({‘A’: [‘foo’, ‘foo’, ‘foo’, ‘foo’, ‘bar’, ‘bar’, ‘bar’, ‘bar’, ‘foo’, ‘foo’, ‘foo’], ‘B’: [‘one’, ‘one’, ‘one’, ‘two’, ‘one’, ‘one’, ‘one’, ‘two’, ‘two’, ‘two’, ‘one’], ‘C’: [‘dull’, ‘dull’, ‘shiny’, ‘dull’, ‘dull’, ‘shiny’, ‘shiny’, ‘dull’, ‘shiny’, … Read more

How to get faster code than numpy.dot for matrix multiplication?

np.dot dispatches to BLAS when NumPy has been compiled to use BLAS, a BLAS implementation is available at run-time, your data has one of the dtypes float32, float64, complex32 or complex64, and the data is suitably aligned in memory. Otherwise, it defaults to using its own, slow, matrix multiplication routine. Checking your BLAS linkage is … Read more

pytables writes much faster than h5py. Why?

This is an interesting comparison of PyTables and h5py write performance. Typically I use them to read HDF5 files (and usually with a few reads of large datasets), so haven’t noticed this difference. My thoughts align with @max9111: that performance should improve as the number of write operations decreased as the size of the written … Read more