Misplaced Pages

Zarr (data format)

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Zarr
Filename extension .zarr
Latest release3
Type of formatMultidimensional array
Open format?Yes
Free format?Yes
Websitezarr.dev

Zarr is an open standard for storing large multidimensional array data. It specifies a protocol and data format, and is designed to be "cloud ready" including random access, by dividing data into subsets referred to as chunks. Zarr can be used within many programming languages, including Python, Java, JavaScript, C++, Rust and Julia. It has been used by organisations such as Google and Microsoft to publish large datasets.

Zarr is designed to support high-throughput distributed I/O on different storage systems, which is a common requirement in cloud computing. Multiple read operations can efficiently occur to a Zarr array in parallel, or multiple write operations in parallel.

Format description

The main data format in Zarr is multidimensional arrays. For parallelisable access, these arrays are stored and accessed as a grid of so-called "chunks". The actual data format on disk depends on the compressor and storage plugins selected by the user.

An illustration of Zarr's chunking data format.

Zarr's design was influenced by that of HDF5, and so it includes similar features for metadata and grouping: arrays can be grouped into named hierarchies, and they can also be annotated with key-value metadata stored alongside the array.

Applications

For bioimaging such as microscopy, a consortium called the Open Microscopy Environment (OME) created a format called "OME-Zarr", based on Zarr with some discipline-specific extensions. Similarly, Zarr is being used to publish weather and satellite data and energy data, among others.

See also

References

  1. "Zarr - chunked, compressed, N-dimensional arrays". zarr.dev. Retrieved 2024-09-12.
  2. "Cloud-Optimized Geospatial Formats Guide: Zarr". guide.cloudnativegeo.org. Retrieved 2024-09-12.
  3. "Zarr Implementations". zarr.dev. Retrieved 2025-01-09.
  4. "Google Cloud: ERA5 data". cloud.google.com. Retrieved 2024-09-12.
  5. "Microsoft Planetary Computer: Reading Zarr Data". planetarycomputer.microsoft.com. Retrieved 2024-09-12.
  6. ^ "Zarr - Tutorial". zarr.readthedocs.io. Retrieved 2024-09-12.
  7. Moore, Josh (2023). "OME-Zarr: a cloud-optimized bioimaging file format with international community support". Histochemistry and Cell Biology. 160 (3). Springer Science and Business Media LLC: 223–251. doi:10.1007/s00418-023-02209-1. hdl:1721.1/151126. ISSN 1432-119X. PMC 10492740. PMID 37428210.
  8. "Lazy loading: Making it easier to access vast datasets of weather & satellite data". openclimatefix.org. Retrieved 2024-09-12.
  9. Sansal, Altay; Kainkaryam, Sribharath; Lasscock, Ben; Valenciano, Alejandro (2023). "MDIO: Open-source format for multidimensional energy data". The Leading Edge. 42 (7). Society of Exploration Geophysicists: 465–473. Bibcode:2023LeaEd..42..465S. doi:10.1190/tle42070465.1. ISSN 1938-3789.

External links


Stub icon

This computing article is a stub. You can help Misplaced Pages by expanding it.

Categories:
Zarr (data format) Add topic