Announcing Anvil: The AI-Native, Open-Source Object Store
Introducing Anvil — the AI-Native Object Store
Fast, self-hosted, S3-compatible storage designed for models, safetensors, gguf files, ONNX artifacts, and large ML datasets.
GitHub: https://github.com/worka-ai/anvil
Latest Release: https://github.com/worka-ai/anvil/releases/latest
Docs: https://worka.ai/docs/anvil/getting-started
Landing Page: https://worka.ai/anvil
Why We Built Anvil
We didn’t set out to build a new object store.
We set out to build our app — and everything broke in predictable, painful ways.
- Git LFS choked on multi‑GB LLM model files
- Hugging Face repos weren’t ideal for private/internal hosting
- S3 and MinIO treated model files as dumb blobs
- Fine‑tunes duplicated base checkpoints 10–20×
- Downloading 7B/13B files repeatedly wrecked developer velocity
- Users couldn’t run models locally without full downloads
- Serving models from home labs, laptops, and edge devices was unreliable
There was no storage layer designed for AI workloads — only general-purpose object stores that weren’t aware of model formats or inference patterns.
So we built one.
What If Storage Was Designed for AI?
Imagine a storage layer that:
- Understands safetensors, gguf, and onnx
- Streams just the layers or tensors you need
- Uses erasure coding instead of 3× replication
- Lets you run LLMs on machines that don’t have enough GPU RAM
- Deduplicates fine‑tunes and LoRAs
- Lets you split buckets across regions with zero friction
- Works offline, on‑prem, hybrid, and cloud
- Uses QUIC for low‑latency tensor streaming
This is what Anvil is built for.
Introducing Anvil
Anvil is a self-hosted, S3-compatible, open-source object store designed for AI-native workloads.
It supports model-aware indexing, tensor streaming, multi-region clustering, erasure coding, and a native gRPC API — while remaining fully compatible with the AWS CLI and standard S3 libraries.
It’s simple enough to run on a laptop.
It’s powerful enough to run a 18‑node production cluster (we do).
Key Features
🧠 Model-Aware Storage
- Native parsing of safetensors, gguf, and onnx
- Tensor-level indexing
- Fetch individual tensors/layers by name
- Avoid full-model downloads for inference
⚡ High-Performance Streaming
- QUIC-powered partial reads
- Zero‑copy mmap for local access
- Huge cold-start reductions for LLMs
💾 Efficient Storage for Fine‑Tunes
- Deduplicated base weights + LoRA deltas
- Massive space savings (100GB → ~150GB instead of 300GB)
- Automatic manifest-based versioning
🧰 Fully S3 Compatible
- Works with AWS CLI, boto3, s3cmd
- Drop-in replacement for S3 or MinIO
- No vendor lock‑in, fully self-hosted
🌐 Clustered & Multi-Region
- libp2p-based peer discovery
- Erasure-coded replication
- Multi-region support for isolation and locality
- Scale from 1 node → 20+ nodes
🛠 Simple Deployment
docker compose up -d
🔒 Open Source
Licensed under Apache-2.0.
Architecture Overview
Storage Layer
Anvil splits objects into blocks, encodes them with Reed‑Solomon parity, and distributes them across the cluster for durability. Unlike classic 3× replication, Anvil uses 1.5× overhead for equivalent resilience.
Model Indexing Pipeline
When uploading safetensors/gguf/onnx file(s), Anvil extracts:
- tensor names
- offsets
- dtypes
- shapes
- metadata
This enables partial reads via gRPC or range reads.
Multi-Region Layout
Global metadata lives in the global Postgres.
Each region has a local regional Postgres for objects and indices.
Nodes gossip via libp2p.
Streaming
Tensor requests are served via:
- Zero‑copy mmap (local)
- QUIC streams (remote)
Allowing inference to load only what it needs.
Code Examples
Upload a safetensors file
aws --endpoint-url http://localhost:9000 s3 cp model.safetensors s3://models/
Stream a tensor
from anvilml import Model
m = Model("s3://models/llama3.safetensors")
q = m.get_tensor("layers.12.attn.q_proj.weight")
Deploy a node
docker compose up -d
Production Readiness
We run a live 18-node cluster powering an AI application that streams models to user devices on-demand.
Anvil has handled:
- multi‑GB model transfers
- multi‑region mirrors
- erasure-coded repairs
- real inference workloads
The system is stable and actively maintained.
Roadmap
- Kubernetes operator
- PyTorch & TensorFlow native file system backends
- vLLM integration
- Expanded gguf indexing
- Dataset chunking and semantic indexing
- Inline compute-in-proximity scheduling
- Optional encrypted storage
Get Started
⭐ GitHub
https://github.com/worka-ai/anvil
📦 Quickstart
https://worka.ai/docs/anvil/getting-started
🌍 Landing Page
🚀 Latest Release
https://github.com/worka-ai/anvil/releases/latest
Final Thoughts
Anvil exists because nothing out there solved our needs as a small AI team working with large models.
We're open-sourcing it because we think others are hitting the same walls.
If you build ML applications, serve models, handle fine‑tunes, or run self-hosted AI infra — we’d love to hear your feedback, ideas, or contributions.
Thanks for reading — and happy self-hosting!
