Skip to main content

Announcing Anvil: The AI-Native, Open-Source Object Store

· 4 min read
Courtney Robinson
Co-founder, engineer

Introducing Anvil — the AI-Native Object Store

Fast, self-hosted, S3-compatible storage designed for models, safetensors, gguf files, ONNX artifacts, and large ML datasets.

GitHub: https://github.com/worka-ai/anvil
Latest Release: https://github.com/worka-ai/anvil/releases/latest Docs: https://worka.ai/docs/anvil/getting-started
Landing Page: https://worka.ai/anvil


Why We Built Anvil

We didn’t set out to build a new object store.
We set out to build our app — and everything broke in predictable, painful ways.

  • Git LFS choked on multi‑GB LLM model files
  • Hugging Face repos weren’t ideal for private/internal hosting
  • S3 and MinIO treated model files as dumb blobs
  • Fine‑tunes duplicated base checkpoints 10–20×
  • Downloading 7B/13B files repeatedly wrecked developer velocity
  • Users couldn’t run models locally without full downloads
  • Serving models from home labs, laptops, and edge devices was unreliable

There was no storage layer designed for AI workloads — only general-purpose object stores that weren’t aware of model formats or inference patterns.

So we built one.


What If Storage Was Designed for AI?

Imagine a storage layer that:

  • Understands safetensors, gguf, and onnx
  • Streams just the layers or tensors you need
  • Uses erasure coding instead of 3× replication
  • Lets you run LLMs on machines that don’t have enough GPU RAM
  • Deduplicates fine‑tunes and LoRAs
  • Lets you split buckets across regions with zero friction
  • Works offline, on‑prem, hybrid, and cloud
  • Uses QUIC for low‑latency tensor streaming

This is what Anvil is built for.


Introducing Anvil

Anvil is a self-hosted, S3-compatible, open-source object store designed for AI-native workloads.
It supports model-aware indexing, tensor streaming, multi-region clustering, erasure coding, and a native gRPC API — while remaining fully compatible with the AWS CLI and standard S3 libraries.

It’s simple enough to run on a laptop.
It’s powerful enough to run a 18‑node production cluster (we do).


Key Features

🧠 Model-Aware Storage

  • Native parsing of safetensors, gguf, and onnx
  • Tensor-level indexing
  • Fetch individual tensors/layers by name
  • Avoid full-model downloads for inference

⚡ High-Performance Streaming

  • QUIC-powered partial reads
  • Zero‑copy mmap for local access
  • Huge cold-start reductions for LLMs

💾 Efficient Storage for Fine‑Tunes

  • Deduplicated base weights + LoRA deltas
  • Massive space savings (100GB → ~150GB instead of 300GB)
  • Automatic manifest-based versioning

🧰 Fully S3 Compatible

  • Works with AWS CLI, boto3, s3cmd
  • Drop-in replacement for S3 or MinIO
  • No vendor lock‑in, fully self-hosted

🌐 Clustered & Multi-Region

  • libp2p-based peer discovery
  • Erasure-coded replication
  • Multi-region support for isolation and locality
  • Scale from 1 node → 20+ nodes

🛠 Simple Deployment

docker compose up -d

🔒 Open Source

Licensed under Apache-2.0.


Architecture Overview

Storage Layer

Anvil splits objects into blocks, encodes them with Reed‑Solomon parity, and distributes them across the cluster for durability. Unlike classic 3× replication, Anvil uses 1.5× overhead for equivalent resilience.

Model Indexing Pipeline

When uploading safetensors/gguf/onnx file(s), Anvil extracts:

  • tensor names
  • offsets
  • dtypes
  • shapes
  • metadata

This enables partial reads via gRPC or range reads.

Multi-Region Layout

Global metadata lives in the global Postgres.
Each region has a local regional Postgres for objects and indices.
Nodes gossip via libp2p.

Streaming

Tensor requests are served via:

  • Zero‑copy mmap (local)
  • QUIC streams (remote)

Allowing inference to load only what it needs.


Code Examples

Upload a safetensors file

aws --endpoint-url http://localhost:9000 s3 cp model.safetensors s3://models/

Stream a tensor

from anvilml import Model
m = Model("s3://models/llama3.safetensors")
q = m.get_tensor("layers.12.attn.q_proj.weight")

Deploy a node

docker compose up -d

Production Readiness

We run a live 18-node cluster powering an AI application that streams models to user devices on-demand.
Anvil has handled:

  • multi‑GB model transfers
  • multi‑region mirrors
  • erasure-coded repairs
  • real inference workloads

The system is stable and actively maintained.


Roadmap

  • Kubernetes operator
  • PyTorch & TensorFlow native file system backends
  • vLLM integration
  • Expanded gguf indexing
  • Dataset chunking and semantic indexing
  • Inline compute-in-proximity scheduling
  • Optional encrypted storage

Get Started

⭐ GitHub

https://github.com/worka-ai/anvil

📦 Quickstart

https://worka.ai/docs/anvil/getting-started

🌍 Landing Page

https://worka.ai/anvil

🚀 Latest Release

https://github.com/worka-ai/anvil/releases/latest


Final Thoughts

Anvil exists because nothing out there solved our needs as a small AI team working with large models.
We're open-sourcing it because we think others are hitting the same walls.

If you build ML applications, serve models, handle fine‑tunes, or run self-hosted AI infra — we’d love to hear your feedback, ideas, or contributions.

Thanks for reading — and happy self-hosting!