HA Milvus on Linux VMs Without Kubernetes

Introduction

Vector search has become a working tool for production search, recommendations, RAG pipelines, and similarity workloads. Milvus is one of the leading open-source vector databases, and most teams running it at scale are running it the way the docs recommend — through the Milvus Operator on a Kubernetes cluster.

That’s a great fit when you already have Kubernetes. It’s a poor fit when you don’t.

A surprising number of teams sit between two extremes. They’ve outgrown single-host Milvus standalone (one VM, no replicas, downtime when the VM blinks) but they don’t run Kubernetes — and don’t want to bring it in just to host a vector database. They have a fleet of plain Linux VMs: physical, virtual, on-prem, in a private subnet, on a cloud provider, in a regulated environment where Kubernetes isn’t permitted, or simply on a budget where a full operator deploy is overkill.

milvus-onprem is for those teams. It’s a single CLI that deploys high-availability Milvus across N Linux VMs without Kubernetes. The cluster runs on plain Docker Compose with a small Python control-plane daemon co-located on every peer. There is no orchestrator. Every peer is identical. The cluster is self-organizing.

This post walks through the design and the trade-offs.

What “no orchestrator” actually means

Kubernetes solves a real problem. It manages container lifecycle across a fleet of hosts, schedules work onto healthy nodes, drains failed ones, and exposes a single API for cluster state. The Milvus Operator builds on top of this to manage Milvus-specific lifecycle: rolling upgrades, replica counts, backups, and so on.

When you remove Kubernetes, you don’t get to skip those problems — you have to solve them another way. milvus-onprem’s answer:

Container lifecycle is handled by Docker’s restart: always plus a per-peer health watcher. Healthy containers stay up, unhealthy ones are restarted in place, and broken peers are visible to operators via journal alerts.
Cluster state lives in etcd. Every peer runs an etcd member. Cluster-wide configuration is stored as etcd keys; topology changes are coordinated through etcd transactions.
Coordination is handled by leader election. Every peer runs the control-plane daemon; one is elected leader via an etcd lease; the leader serves writes and the followers serve reads, redirecting writes to the leader.
Object storage runs as distributed MinIO across all peers, erasure-coded so a host loss doesn’t take the bucket offline.
Load balancing runs as nginx Layer-4 on every peer. Clients connect to any peer’s nginx and are routed to a healthy Milvus.

A 3-VM cluster ends up with 3 etcd members, 3 distributed MinIO drives, 3 Milvus instances behind 3 nginx LBs, and 3 control-plane daemons electing a leader. Lose any one peer and the other two stay healthy and serving.

The shape per peer

Per VM, the deployment lands roughly five containers:

  flowchart TB
  subgraph p["one peer"]
    direction TB
    LB["nginx :19537<br/>L4 load balancer"]
    M["Milvus :19530"]
    D["control-plane :19500"]
    E[("etcd :2379")]
    S[("MinIO :9000")]
  end
  Client["pymilvus client"] -->|"gRPC"| LB
  LB --> M
  M --> E
  M --> S
  D --> E

Every peer carries the full set. Cross-peer, etcd participates in Raft, MinIO drives erasure-code together, the control-plane daemons participate in leader election, and the Milvus instances share the same etcd metadata and MinIO segment storage so they form a single logical Milvus cluster. nginx round-robins gRPC traffic across all peers’ Milvus.

Milvus 2.6 added an embedded WAL (Woodpecker), so the message-queue dependency that older versions required is gone — that’s the recommended version for new deploys. Milvus 2.5 still uses Pulsar as its MQ, which means a 2.5 cluster also has a Pulsar singleton on one designated peer. The CLI handles both versions transparently.

Five-minute deploy

The whole flow is three commands, with the second one running on every peer except the first:

# clone on every peer
git clone https://github.com/codeadeel/milvus-onprem.git ~/milvus-onprem
cd ~/milvus-onprem

# on the first peer (the bootstrap)
./milvus-onprem init --mode=distributed \
                     --milvus-version=v2.6.11 \
                     --ha-cluster-size=3
# init prints a `./milvus-onprem join <bootstrap-ip>:19500 <token>` line

# on every other peer
./milvus-onprem join <bootstrap-ip>:19500 <token>

# on any peer
./milvus-onprem status     # all green
./milvus-onprem smoke      # functional test

There’s no Kubernetes to install, no operator pod to run, no cluster CRD to define. init and join handle the work — rendering compose templates per peer, prepping host data directories, bringing up containers in dependency order, joining etcd via member-add, distributing the cluster’s shared configuration to peers via the control plane.

Adding a 4th VM later is the same one command. ./milvus-onprem join ... on the new peer; the leader fans the topology change out to existing peers, and the new node falls into the existing etcd Raft and MinIO pool.

The control plane

Cluster-mutating operations don’t run via SSH-to-each-peer scripting. They go through the control-plane daemon, which runs as one container per peer. The daemon exposes an authenticated HTTP API with a job system for long-running operations:

create-backup — milvus-backup snapshot to MinIO
export-backup / restore-backup — move backups between filesystems and clusters
backup-etcd — snapshot etcd
rotate-token — rotate the cluster’s bearer token across every peer atomically
remove-node — graceful peer removal with drain
migrate-pulsar — move the 2.5 Pulsar singleton to a different peer
upgrade — rolling Milvus image-tag upgrade

When an operator runs the corresponding CLI command, the CLI POSTs a job to the leader’s /jobs endpoint and the leader’s worker executes it, fanning out to other peers via daemon-to-daemon HTTP. There’s no SSH between peers anywhere in the codebase — production environments rarely allow it, and the daemon’s HTTP control plane is the supported transport.

A watchdog runs inside the same daemon. It polls every container’s healthcheck and, if a local container goes unhealthy for N consecutive ticks, restarts it in place. Cross-peer reachability is also probed; if a peer disappears for N ticks, the leader emits a PEER_DOWN_ALERT line to the journal so monitoring picks it up.

The MinIO trade-off worth knowing about

MinIO is the storage layer underneath Milvus. There are two practical ways to lay out a multi-host MinIO cluster, and milvus-onprem supports both via an init --ha-cluster-size=N flag:

Layout	Scale-out	Host-loss tolerance
Default (per-host pools)	Trivial — joining a peer just appends a new pool.	None. Losing a host loses a pool, and any read against the bucket errors until that host comes back. Milvus’s streamingnode crashes on boot in that state and the cluster goes dark.
`--ha-cluster-size=N` (wide pool)	First N peers form one pool with N drives, erasure-coded. Peers joining beyond N land as additional per-host pools.	Loss of any single host tolerated for both reads and writes.

For an operator who values host-loss tolerance more than effortless scale-out — most production deploys — passing --ha-cluster-size=3 (or whatever the cluster size is) at init is the right move. The repo’s failover doc has the full trade-off; the README’s quickstart uses the wide-pool layout by default.

Backups and migrations

Every cluster gets the official milvus-backup binary integrated as a daemon worker. Backups land in MinIO, export-backup copies them out to a filesystem path, and restore-backup brings them back into the same or a different cluster — including across major versions (a 2.5 backup restores cleanly into a 2.6 cluster). Etcd snapshots are a separate, smaller knob via backup-etcd.

For the rare case of a totally lost peer (VM gone, disk wiped), the troubleshooting doc walks through the etcdctl member remove plus member add plus fresh join --existing recovery procedure. The cluster heals itself: MinIO erasure coding reconstructs the dead peer’s drive share automatically once the replacement comes online.

What this isn’t

milvus-onprem is intentionally not:

A replacement for the Milvus Operator on Kubernetes. If you already run Kubernetes, the Operator is the right tool.
A managed service. There’s no SaaS layer; you run it on your own VMs.
A multi-tenant Milvus. It’s a single cluster per deployment. Multi-tenancy lives in Milvus’s own database and collection model, not at the deploy layer.

It’s also intentionally simple. The whole codebase is a few hundred lines of Bash plus a small FastAPI daemon, well under five thousand lines of code total. Every concern lives in its own file under lib/ or daemon/. If something doesn’t work, it’s the kind of codebase you can read in an afternoon and patch.

Conclusion

Vector databases at HA don’t have to mean Kubernetes. For teams with a fleet of plain Linux VMs and no orchestrator, milvus-onprem fills the gap between single-host standalone and a full operator deploy. Three commands, N peers, self-organizing, transparent to operate.

The repository is on GitHub at codeadeel/milvus-onprem under Apache 2.0. Issues, PRs, and feedback are welcome.