zea.data.dataloader

H5 dataloader for loading images from zea datasets.

Functions

generate_h5_indices(file_paths, file_shapes, ...)

Generate indices for h5 files.

Classes

H5Generator(file_paths[, key, n_frames, ...])

H5Generator class for iterating over hdf5 files in an advanced way.

class zea.data.dataloader.H5Generator(file_paths, key='data/image', n_frames=1, shuffle=True, return_filename=False, limit_n_samples=None, limit_n_frames=None, seed=None, cache=False, additional_axes_iter=None, sort_files=True, overlapping_blocks=False, initial_frame_axis=0, insert_frame_axis=True, frame_index_stride=1, frame_axis=-1, validate=True, **kwargs)[source]

Bases: Dataset

H5Generator class for iterating over hdf5 files in an advanced way. Mostly used internally, you might want to use the Dataloader class instead. Loads one item at a time. Always outputs numpy arrays.

Initializes the Dataset.

Parameters:
  • file_paths (List[str]) – (list of) path(s) to the folder(s) containing the HDF5 file(s) or list of HDF5 file paths. Can be a mixed list of folders and files.

  • validate (bool) – Whether to validate the dataset. Defaults to True.

  • directory_splits (list, optional) – List of directory split by. Is a list of floats between 0 and 1, with the same length as the number of file_paths given. If none, all files in file_paths are used.

iterator()[source]

Generator that yields images from the hdf5 files.

load(file, key, indices=None)[source]

Extract data from hdf5 file. :param file_name: name of the file to extract image from. :type file_name: str :param key: key of the hdf5 dataset to grab data from. :type key: str :param indices: indices to extract image from (tuple of slices) :type indices: Union[Tuple[Union[list, slice, int], ...], List[int], int, None]

Returns:

image extracted from hdf5 file and indexed by indices.

Return type:

np.ndarray

summary()[source]

Return a string with dataset statistics and per-directory breakdown.

zea.data.dataloader.generate_h5_indices(file_paths, file_shapes, n_frames, frame_index_stride, key='data/image', initial_frame_axis=0, additional_axes_iter=None, sort_files=True, overlapping_blocks=False, limit_n_frames=None)[source]

Generate indices for h5 files.

Generates a list of indices to extract images from hdf5 files. Length of this list is the length of the extracted dataset.

Parameters:
  • file_paths (List[str]) – List of file paths.

  • file_shapes (list) – List of file shapes.

  • n_frames (int) – Number of frames to load from each hdf5 file.

  • frame_index_stride (int) – Interval between frames to load.

  • key (str) – Key of hdf5 dataset to grab data from. Defaults to “data/image”.

  • initial_frame_axis (int) – Axis to iterate over. Defaults to 0.

  • additional_axes_iter (Optional[List[int]]) – Additional axes to iterate over in the dataset. Defaults to None.

  • sort_files (bool) – Sort files by number. Defaults to True.

  • overlapping_blocks (bool) – Will take n_frames from sequence, then move by 1. Defaults to False.

  • limit_n_frames (int | None) – Limit the number of frames to load from each file. This means n_frames per data file will be used. These will be the first frames in the file. Defaults to None.

Returns:

List of tuples with indices to extract images from hdf5 files.

(file_name, key, indices) with indices being a tuple of slices.

Return type:

list

Example

[
    (
        "/folder/path_to_file.hdf5",
        "data/image",
        (range(0, 1), slice(None, 256, None), slice(None, 256, None)),
    ),
    (
        "/folder/path_to_file.hdf5",
        "data/image",
        (range(1, 2), slice(None, 256, None), slice(None, 256, None)),
    ),
    ...,
]