What is the difference between the two ways of accessing the hdf5 group in SVHN dataset?

First, there is a minor difference in output from your 2 methods. Method 1: returns the full array (of the encoded file name) Method 2: only returns the first element (character) of the array Let’s deconstruct your code to understand what you have. The first part deals with h5py data objects. f[‘digitStruct’] -> returns a … Read more

Is there an analysis speed or memory usage advantage to using HDF5 for large array storage (instead of flat binary files)?

HDF5 Advantages: Organization, flexibility, interoperability Some of the main advantages of HDF5 are its hierarchical structure (similar to folders/files), optional arbitrary metadata stored with each item, and its flexibility (e.g. compression). This organizational structure and metadata storage may sound trivial, but it’s very useful in practice. Another advantage of HDF is that the datasets can … Read more

How to read a v7.3 mat file via h5py?

Matlab 7.3 file format is not extremely easy to work with h5py. It relies on HDF5 reference, cf. h5py documentation on references. >>> import h5py >>> f = h5py.File(‘test.mat’) >>> list(f.keys()) [‘#refs#’, ‘struArray’] >>> struArray = f[‘struArray’] >>> struArray[‘name’][0, 0] # this is the HDF5 reference <HDF5 object reference> >>> f[struArray[‘name’][0, 0]].value # this is … Read more

Input and output numpy arrays to h5py

h5py provides a model of datasets and groups. The former is basically arrays and the latter you can think of as directories. Each is named. You should look at the documentation for the API and examples: http://docs.h5py.org/en/latest/quick.html A simple example where you are creating all of the data upfront and just want to save it … Read more

Pandas can’t read hdf5 file created with h5py

I’ve worked a little on the pytables module in pandas.io and from what I know pandas interaction with HDF files is limited to specific structures that pandas understands. To see what these look like, you can try import pandas as pd import numpy as np pd.Series(np.zeros((3,5),dtype=np.float32).to_hdf(‘test.h5′,’test’) If you open ‘test.h5’ in HDFView, you will see … Read more

Incremental writes to hdf5 with h5py

Per the FAQ, you can expand the dataset using dset.resize. For example, import os import h5py import numpy as np path=”/tmp/out.h5″ os.remove(path) with h5py.File(path, “a”) as f: dset = f.create_dataset(‘voltage284’, (10**5,), maxshape=(None,), dtype=”i8″, chunks=(10**4,)) dset[:] = np.random.random(dset.shape) print(dset.shape) # (100000,) for i in range(3): dset.resize(dset.shape[0]+10**4, axis=0) dset[-10**4:] = np.random.random(10**4) print(dset.shape) # (110000,) # (120000,) # … Read more

pytables writes much faster than h5py. Why?

This is an interesting comparison of PyTables and h5py write performance. Typically I use them to read HDF5 files (and usually with a few reads of large datasets), so haven’t noticed this difference. My thoughts align with @max9111: that performance should improve as the number of write operations decreased as the size of the written … Read more