H3: An embedded object store
2020-11-22, 13:00–13:30, Κύρια Αίθουσα Ομιλιών

H3 is an embedded High speed, High volume, and High availability object store, backed by a high-performance key-value store (RocksDB, Redis, etc.) or a filesystem. H3 is implemented in the h3lib library, which provides a C-based cloud-friendly API, similar to Amazon's S3. Python and Java wrappers are also available. The H3 FUSE filesystem allows object access using file semantics, while the CSI H3 mount plugin (csi-h3 for short), allows you to use H3 FUSE for implementing persistent volumes in Kubernetes. By embedding the object store in the application, thus avoiding the REST layer, we show that data operations gain significant performance benefits. The whole H3 ecosystem is open source, actively developed, and is currently been utilized as a key component in the EVOLVE software stack - a project funded by European Union’s Horizon 2020 research and innovation programme.

H3 is a thin, stateless layer that provides object semantics on top of a high-performance key-value store - a typical service deployed in HPC and Big Data environments. By transitioning a cloud-based S3 service to H3 and running in a cluster, we expect applications to enjoy much faster data operations and - if the key-value store is distributed - to easily scale out across all cluster nodes. In the later case, the object service is not provided centrally, but everywhere on the cluster. In essence, H3 implements a translation layer between the object namespace and a key-value store, similar to how a filesystem provides a hierarchical namespace of files on top of a block device.

H3 provides a flat organization scheme where each data object is linked to a globally unique identifier called bucket. Buckets belong to users and contain objects. The H3 API supports typical bucket operations, such as create, list, and delete. Object management includes reading/writing objects from/to H3, copying, renaming, listing, and deleting. H3 also supports multipart operations, where objects are written in parts and coalesced at the end.

H3 is provided as C library, called h3lib. h3lib implements the object API as a series of functions that convert the bucket and object operations to operations in the provided key-value backend. The key-value store interface is abstracted into a common API with implementations for RocksDB (for single-node runs), Redis (for over-the-network storage), and a filesystem (for easy testing). H3 can easily be used in Python and Java programs, as respective native libraries are available.

The H3 ecosystem also includes h3cli (a utility for accessing H3 from the command line), H3 FUSE (a filesystem that allows object access using file semantics), h3-benchmark (to measure H3 performance), the CSI H3 mount plugin (for implementing persistent volumes in Kubernetes using H3), as well as a custom Argo fork (supporting H3 as a workflow artifact repository).

The H3 project is committed to open source and all code is available on GitHub. H3 has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825061 (EVOLVE) and is currently a key component in EVOLVE's user-facing software stack, which also includes other relevant open source projects. EVOLVE brings together best practices from the HPC, Big Data (AI) and Cloud worlds, to build a state-of-the-art platform, which in turn supports numerous pilot applications from diverse industrial and research domains (genomics, automotive, agriculture, maritime, etc.).

See also: Presentation