Dataset

Header file: <libs/zarr/dataset.hpp> [source]

template<typename Store>
class Dataset

A class representing a dataset made from a Zarr group (i.e. collection of Zarr arrays) in a storage system.

This class provides functionality to create a dataset as a group of arrays obeying the Zarr storage specification version 2 (https://zarr.readthedocs.io/en/stable/spec/v2.html) that is also compatible with Xarray and NetCDF.

Template Parameters:

Store – The type of the store object used by the dataset.

Public Functions

inline explicit Dataset(Store &store)

Constructs a Dataset with the specified store object.

This constructor initializes a Dataset with the provided store object by initialising a ZarrGroup and writing some additional metatdata for Xarray and NetCDF.

Parameters:

store – The store object associated with the Dataset.

inline size_t get_dimension(const std::string &dimname) const

Returns the size of an existing dimension in the dataset.

Parameters:

dimname – A string for the name of the dimension in the dataset.

Returns:

The size of (i.e. number of elements along) the dimension.

inline void set_dimension(const std::pair<std::string, size_t> &dim)

Sets the size of an existing dimension in the dataset.

Parameters:

dim – A pair containing the name of the dimension and its new size to be set.

template<typename T>
inline XarrayZarrArray<Store, T> create_array(const std::string_view name, const std::string_view units, const double scale_factor, const std::vector<size_t> &chunkshape, const std::vector<std::string> &dimnames) const

Creates a new array in the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • name – The name of the new array.

  • units – The units of the array data.

  • scale_factor – The scale factor of array data.

  • chunkshape – The shape of the chunks of the array.

  • dimnames – The names of each dimension of the array.

Returns:

An instance of XarrayZarrArray representing the newly created array.

template<typename T>
inline XarrayZarrArray<Store, T> create_coordinate_array(const std::string_view name, const std::string_view units, const double scale_factor, const size_t chunksize, const size_t dimsize)

Creates a new 1-D array for a coordinate of the dataset.

Template Parameters:

T – The data type of the coordinate array.

Parameters:
  • name – The name of the new coordinate.

  • units – The units of the coordinate.

  • scale_factor – The scale factor of the coordinate data.

  • chunksize – The size of each 1-D chunk of the coordinate array.

  • dimsize – The initial size of the coordinate (number of elements along array).

Returns:

An instance of XarrayZarrArray representing the newly created coordinate array.

template<typename T>
inline XarrayZarrArray<Store, T> create_ragged_array(const std::string_view name, const std::string_view units, const double scale_factor, const std::vector<size_t> &chunkshape, const std::vector<std::string> &dimnames, const std::string_view sampledimname) const

Creates a new ragged array in the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • name – The name of the new array.

  • units – The units of the array data.

  • scale_factor – The scale factor of array data.

  • chunkshape – The shape of the chunks of the array.

  • dimnames – The names of each dimension of the array.

  • sampledimname – The names of the sample dimension of the array.

Returns:

An instance of XarrayZarrArray representing the newly created ragged array.

template<typename T>
inline XarrayZarrArray<Store, T> create_raggedcount_array(const std::string_view name, const std::string_view units, const double scale_factor, const std::vector<size_t> &chunkshape, const std::vector<std::string> &dimnames, const std::string_view sampledimname) const

Creates a new raggedcount array in the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • name – The name of the new array.

  • units – The units of the array data.

  • scale_factor – The scale factor of array data.

  • chunkshape – The shape of the chunks of the array.

  • dimnames – The names of each dimension of the array.

  • sampledimname – The names of the sample dimension of the array.

Returns:

An instance of XarrayZarrArray representing the newly created raggedcount array.

template<typename T>
inline void write_arrayshape(XarrayZarrArray<Store, T> &xzarr) const

Calls array’s shape function to ensure the shape of the array matches the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:

xzarr – An instance of XarrayZarrArray representing the array.

template<typename T>
inline void write_arrayshape(const std::shared_ptr<XarrayZarrArray<Store, T>> xzarr_ptr) const

Calls array’s shape function to ensure the shape of the array matches the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:

xzarr_ptr – A shared pointer to the instance of XarrayZarrArray representing the array.

template<typename T>
inline void write_ragged_arrayshape(XarrayZarrArray<Store, T> &xzarr) const

Calls array’s shape function to write the shape of the array for a ragged array.

Template Parameters:

T – The data type of the array.

Parameters:

xzarr – An instance of XarrayZarrArray representing the array.

template<typename T>
inline void write_to_array(XarrayZarrArray<Store, T> &xzarr, const typename Buffer<T>::viewh_buffer h_data) const

Writes data from Kokkos view in host memory to a Zarr array in the dataset and calls function to ensure the shape of the array matches the dimensions of the dataset.

Function writes data to an array in the dataset and updates the metadata for the shape of the array to ensure the size of each dimension of the array is consistent with the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • xzarr – An instance of XarrayZarrArray representing the array.

  • h_data – The data to be written to the array.

template<typename T>
inline void write_to_array(const std::shared_ptr<XarrayZarrArray<Store, T>> xzarr_ptr, const typename Buffer<T>::viewh_buffer h_data) const

Writes data from Kokkos view in host memory to a Zarr array in the dataset and calls function to ensure the shape of the array matches the dimensions of the dataset.

Function writes data to an array in the dataset and updates the metadata for the shape of the array to ensure the size of each dimension of the array is consistent with the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • xzarr_ptr – A shared pointer to the instance of XarrayZarrArray representing the array.

  • h_data – The data to be written to the array.

template<typename T>
inline void write_to_array(const std::shared_ptr<XarrayZarrArray<Store, T>> xzarr_ptr, const T data) const

Writes 1 data element to a Zarr array in the dataset and calls function to ensure the shape of the array matches the dimensions of the dataset.

Function writes 1 data element to an array in the dataset and updates the metadata for the shape of the array to ensure the size of each dimension of the array is consistent with the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • xzarr_ptr – A shared pointer to the instance of XarrayZarrArray representing the array.

  • data – The data element to be written to the array.

template<typename T>
inline void write_to_ragged_array(XarrayZarrArray<Store, T> &xzarr, const typename Buffer<T>::viewh_buffer h_data) const

Writes data from Kokkos view in host memory to a Zarr array in the dataset and calls function to ensure the shape of the array matches the dimensions of the dataset.

Function writes data to an array in the dataset and updates the metadata for the shape of the array to ensure the size of each dimension of the array is consistent with the dimensions of the dataset.

Template Parameters:

T – The data type of the array.

Parameters:
  • xzarr – An instance of XarrayZarrArray representing the array.

  • h_data – The data to be written to the array.

Private Functions

inline void add_dimension(const std::pair<std::string, size_t> &dim)

Adds a dimension to the dataset.

Parameters:

dim – A pair containing the name and size of the dimension to be added.

Private Members

ZarrGroup<Store> group

Reference to the zarr group object.

std::unordered_map<std::string, size_t> datasetdims

map from name of each dimension in dataset to their size