ZarrArray

Header file: <libs/zarr/zarr_array.hpp> [source]

inline std::vector<size_t> good2Dchunkshape(const size_t maxchunk, const size_t dim1size)

Given maximum chunk size ‘maxchunk’ and length of inner dimension of one chunk of array ‘dim1size’, function returns the largest possible chunk shape that has the length of its inner dimension = dim1size.

dim1size must also be <= maxchunk and to ensure good chunking, dim1size should itself be completely divisible by the final length of the inner dimension of the 2-D array.

Parameters:
  • maxchunk – The maximum chunk size (maximum number of elements in chunk).

  • dim1size – The length of (number of elements along) the inner dimension of one chunk.

Returns:

std::vector<size_t> The largest possible 2-D chunk shape.

template<typename Store>
inline void write_zarray_json(Store &store, std::string_view name, std::string_view metadata)

Write metadata string to a store under a .zarray key.

write metadata under .zarray key in store for an array called ‘name’. The key and metadata could be anything, but for example .zarray could be a json file in a file system store (see FSStore) for the metadata which must exist in order to decode chunks of an array according to Zarr storage specification version 2 (https://zarr.readthedocs.io/en/stable/spec/v2.html),

Template Parameters:

Store – The type of the store object where the metadata will be written.

Parameters:
  • store – The store object where the metadata will be written.

  • name – The name under which the .zarray key will be stored in the store.

  • metadata – The metadata to write for the .zarray key.

template<typename Store, typename T>
class ZarrArray

A template class representing a Zarr array.

This class provides functionality to write an array to a specified store via a buffer according to the Zarr storage specification version 2 (https://zarr.readthedocs.io/en/stable/spec/v2.html).

Template Parameters:
  • Store – The type of store where the array will be stored.

  • T – The data type stored in the arrays.

Public Functions

inline ZarrArray(Store &store, const std::string_view name, const std::vector<size_t> &chunkshape, const bool is_backend, const std::vector<size_t> &reduced_arrayshape = std::vector<size_t>({}))

Constructs a ZarrArray object.

Initializes an empty Zarr array in the provided store in order to writes chunks of an array to the store via a buffer. The assertions in this constructor ensure chunks have the same number of dimensions for the array. The buffer is the size of exactly 1 chunk, and chunks’ shape is restricted such that the final array dimensions are exactly integer multiples of its chunks along all but the outermost (0th) dimension of the array. Order of data written to chunks is assumed to increment along innermost dimensions first.

Parameters:
  • store – The store where the array will be stored.

  • name – The name of the array.

  • chunkshape – The shape of individual data chunks along each dimension.

  • is_backend – boolean is true if zarr array is a backend of something else e.g. xarray.

  • reduced_arrayshape – The shape of the array along all but the outermost (0th) dimension.

inline ~ZarrArray()

Destroys the ZarrArray object.

Writes the buffer to a chunk of the array in the store if it isn’t empty and issues a warning if the data in buffer mismatches the array’s expected dimensions. If the array is not a backend (e.g. of an array in an xarray or NetCDF dataset), then the metadata for the array’s shape is also updated and warnings are issued if the array is incomplete.

inline size_t get_totnchunks()

Get the total number of chunks currently written to array in store.

Returns:

The total number of chunks.

inline size_t get_totalndata()

Get the total number of data elements currently written to array in store and in buffer.

Includes data in buffer, so not equal to totndata

Returns:

The total number of data elements.

inline void write_arrayshape(const std::vector<size_t> &arrayshape)

Write the array shape to the store.

This function writes the given array shape to the store as part of the metadata in the Zarr .zarray json file. Function also asserts that the number of dimensions of the given arrayshape is consitent with number of dimensions provided by the shape of each chunk.

Parameters:

arrayshape – The array shape to be written.

Pre:

The number of dimensions of the provided array shape must be equal to that of the chunk shape. Otherwise, an assertion error is triggered.

inline void write_to_zarr_array(const viewh_buffer h_data)

Writes data from Kokkos view in host memory to chunks of a Zarr array in a store via a buffer and keep metadata in zarray .json file up-to-date with written chunks.

First copies some data from the view to a buffer (until number of elements in buffer = chunksize). Second writes any whole chunks of the array into a store. Thirdly updates the .zarray json file for the Zarr metadata about the shape of the array accordingly. Finally copies any leftover data, number of elements < chunksize, into the buffer. Assertion checks there is no remainng data unattended to.

Parameters:

h_data – The data in a Kokkos view in host memory which should be written to the array in a store.

inline void write_to_array(const viewh_buffer h_data)

Writes data from Kokkos view in host memory to chunks of Zarr array in a store via a buffer. Function does not write metadata to zarray .json file.

First copies some data from the view to a buffer (until number of elements in buffer = chunksize), then writes any whole chunks of the array into a store. Finally copies any leftover data, number of elements < chunksize, into the buffer. Assertion checks there is no remainng data unattended to. Function useful when using zarr array as backend of a dataset and/or you do not want to write metadata for the array when writing data elements.

Parameters:

h_data – The data in a Kokkos view in host memory which should be written to the array in a store.

inline void write_to_array(const T data)

Writes 1 element of data to a Zarr array (writing to the store in chunks via a buffer). Function does not write metadata to zarray .json file.

First copies data element from the view to a buffer (until number of elements in buffer = chunksize), then writes whole chunk of the array into the store. Function useful when using zarr array as backend of a dataset and/or you do not want to write metadata for the array when writing data elements.

Parameters:

data – The data element which should be written to the array in a store.

Private Types

using viewh_buffer = Buffer<T>::viewh_buffer
using subviewh_buffer = Buffer<T>::subviewh_buffer

Private Functions

inline std::vector<size_t> get_arrayshape() const

Get the shape of the array based on the number of data elements and chunks written in the store.

This method assumes that writing of chunks always fills inner dimensions first. The array shape returned is always at least large enough to display all the elements of data in the array so far along each dimension (i.e., arraysize >= totndata along each dimension of the array).

Returns:

A vector representing the shape of the array.

inline subviewh_buffer write_chunks_to_store(const subviewh_buffer h_data)

Writes chunks of data from a kokkos view in host memory to the Zarr array in a store.

First writes the buffer to a chunk of the array if it’s full. Then writes whole chunks directly from the Kokkos view if the view contains enough elements for whole chunk(s) to be written. Then updates the shape of the array along its outermost dimension with the accumulated change in shape of the array due to the chunks that have been written. Finally returns a (sub)view of the remaining data not written to a chunk (number of elements in subview < chunksize). Note that this function does not ensure the .zarray json file metadata is kept up-to-date with changes to the arrayshape that may occur due to the increase in number of elements of data written to the array during this function call.

Parameters:

h_data – Kokkos view of the data to write to the store in host memory.

Returns:

The remaining data that was not written to chunks.

Private Members

Store &store

store in which to write Zarr array

std::string_view name

Name of array to write in store.

size_t totnchunks

Total number of chunks of array written to store.

size_t totndata

Total number of elements of data in array written to store.

Chunks chunks

Method to write chunks of array in store.

Buffer<T> buffer

Buffer to hold data before writing chunks to store.

ZarrMetadata<T> zarr_metadata

Metadata required for zarr array excluding array’s shape

bool is_backend

true if zarr array is a backend of something else e.g. xarray