Introduction to Anvil
Worka Anvil is an open-source distributed storage and compute system designed for simplicity, scalability, and future extensibility. The goal is to provide a system that can start small—a single-node instance that fits into a Docker Compose file—and scale seamlessly to thousands of peers and millions of clients.
What makes Anvil unique is that it isn't just about storage. It is also designed for a world where peers can register compute capabilities. Whether that's machine learning inference, data processing, or other compute-heavy tasks, Anvil treats compute as a first-class citizen alongside storage.
Guiding Principles
Anvil's design is guided by a few core principles:
-
Operational Simplicity: The system has as few moving parts as possible. Beyond the Anvil binary itself, PostgreSQL is the only external dependency.
-
Scalability from One to Many: Anvil is designed to run just as well on a developer's laptop as it does in a multi-node, multi-region cluster. The architecture for a single node is the same as for a thousand nodes.
-
Durability and Security: Data durability is achieved via Reed-Solomon erasure coding. As part of this process, data is also encrypted at rest, providing an essential layer of security.
-
Performance-First: We prioritize performance by using modern, efficient technologies like Rust, Tokio, QUIC, and a two-phase commit process for safe, atomic writes.
Core Technologies
-
Rust (2024 Edition): Provides memory safety, fearless concurrency, and zero-cost abstractions for high-performance, low-level networking and storage code.
-
QUIC (via
libp2p): All peer-to-peer communication, for both cluster gossip and data transfer, happens over QUIC, a modern, secure, and high-performance transport protocol that runs over UDP. -
PostgreSQL (v17+): Postgres is used as the metadata and indexing layer, split into two distinct roles:
- A Global database stores low-volume, high-importance data like tenants, buckets, and security policies.
- Regional databases store the high-volume object metadata for each region, enabling massive horizontal scaling.
-
Tonic and Axum: The API layer is built using
tonicfor the gRPC API andaxumfor the S3-compatible HTTP gateway, served from a single port for simplicity.