Announcing Anvil: The AI-Native, Open-Source Object Store

November 19, 2025 · 4 min read

Co-founder, engineer

Introducing Anvil — the AI-Native Object Store

Fast, self-hosted, S3-compatible storage designed for models, safetensors, gguf files, ONNX artifacts, and large ML datasets.

GitHub: https://github.com/worka-ai/anvil
Latest Release: https://github.com/worka-ai/anvil/releases/latest Docs: https://worka.ai/docs/anvil/getting-started
Landing Page: https://worka.ai/anvil

Why We Built Anvil

We didn’t set out to build a new object store.
We set out to build our app — and everything broke in predictable, painful ways.

Git LFS choked on multi‑GB LLM model files
Hugging Face repos weren’t ideal for private/internal hosting
S3 and MinIO treated model files as dumb blobs
Fine‑tunes duplicated base checkpoints 10–20×
Downloading 7B/13B files repeatedly wrecked developer velocity
Users couldn’t run models locally without full downloads
Serving models from home labs, laptops, and edge devices was unreliable

There was no storage layer designed for AI workloads — only general-purpose object stores that weren’t aware of model formats or inference patterns.

So we built one.

What If Storage Was Designed for AI?

Imagine a storage layer that:

Understands safetensors, gguf, and onnx
Streams just the layers or tensors you need
Uses erasure coding instead of 3× replication
Lets you run LLMs on machines that don’t have enough GPU RAM
Deduplicates fine‑tunes and LoRAs
Lets you split buckets across regions with zero friction
Works offline, on‑prem, hybrid, and cloud
Uses QUIC for low‑latency tensor streaming

This is what Anvil is built for.

Introducing Anvil

Anvil is a self-hosted, S3-compatible, open-source object store designed for AI-native workloads.
It supports model-aware indexing, tensor streaming, multi-region clustering, erasure coding, and a native gRPC API — while remaining fully compatible with the AWS CLI and standard S3 libraries.

It’s simple enough to run on a laptop.
It’s powerful enough to run a 18‑node production cluster (we do).

Key Features

🧠 Model-Aware Storage

Native parsing of safetensors, gguf, and onnx
Tensor-level indexing
Fetch individual tensors/layers by name
Avoid full-model downloads for inference

⚡ High-Performance Streaming

QUIC-powered partial reads
Zero‑copy mmap for local access
Huge cold-start reductions for LLMs

💾 Efficient Storage for Fine‑Tunes

Deduplicated base weights + LoRA deltas
Massive space savings (100GB → ~150GB instead of 300GB)
Automatic manifest-based versioning

🧰 Fully S3 Compatible

Works with AWS CLI, boto3, s3cmd
Drop-in replacement for S3 or MinIO
No vendor lock‑in, fully self-hosted

🌐 Clustered & Multi-Region

libp2p-based peer discovery
Erasure-coded replication
Multi-region support for isolation and locality
Scale from 1 node → 20+ nodes

🛠 Simple Deployment

docker compose up -d

🔒 Open Source

Licensed under Apache-2.0.

Architecture Overview

Storage Layer

Anvil splits objects into blocks, encodes them with Reed‑Solomon parity, and distributes them across the cluster for durability. Unlike classic 3× replication, Anvil uses 1.5× overhead for equivalent resilience.

Model Indexing Pipeline

When uploading safetensors/gguf/onnx file(s), Anvil extracts:

tensor names
offsets
dtypes
shapes
metadata

This enables partial reads via gRPC or range reads.

Multi-Region Layout

Global metadata lives in the global Postgres.
Each region has a local regional Postgres for objects and indices.
Nodes gossip via libp2p.

Streaming

Tensor requests are served via:

Zero‑copy mmap (local)
QUIC streams (remote)

Allowing inference to load only what it needs.

Code Examples

Upload a safetensors file

aws --endpoint-url http://localhost:9000 s3 cp model.safetensors s3://models/

Stream a tensor

from anvilml import Model
m = Model("s3://models/llama3.safetensors")
q = m.get_tensor("layers.12.attn.q_proj.weight")

Deploy a node

docker compose up -d

Production Readiness

We run a live 18-node cluster powering an AI application that streams models to user devices on-demand.
Anvil has handled:

multi‑GB model transfers
multi‑region mirrors
erasure-coded repairs
real inference workloads

The system is stable and actively maintained.

Roadmap

Kubernetes operator
PyTorch & TensorFlow native file system backends
vLLM integration
Expanded gguf indexing
Dataset chunking and semantic indexing
Inline compute-in-proximity scheduling
Optional encrypted storage

Get Started

Final Thoughts

Anvil exists because nothing out there solved our needs as a small AI team working with large models.
We're open-sourcing it because we think others are hitting the same walls.

If you build ML applications, serve models, handle fine‑tunes, or run self-hosted AI infra — we’d love to hear your feedback, ideas, or contributions.

Thanks for reading — and happy self-hosting!

Introducing Anvil — the AI-Native Object Store​

Why We Built Anvil​

What If Storage Was Designed for AI?​

Introducing Anvil​

Key Features​

🧠 Model-Aware Storage​

⚡ High-Performance Streaming​

💾 Efficient Storage for Fine‑Tunes​

🧰 Fully S3 Compatible​

🌐 Clustered & Multi-Region​

🛠 Simple Deployment​

🔒 Open Source​

Architecture Overview​

Storage Layer​

Model Indexing Pipeline​

Multi-Region Layout​

Streaming​

Code Examples​

Upload a safetensors file​

Stream a tensor​

Deploy a node​

Production Readiness​

Roadmap​

Get Started​

⭐ GitHub​

📦 Quickstart​

🌍 Landing Page​

🚀 Latest Release​

Final Thoughts​