# AINode — Complete Local AI Platform for NVIDIA GPUs

Last updated: 2026-04-15
Current version: v0.4.0
License: Apache 2.0
Source: https://github.com/getainode/ainode
Marketing site: https://ainode.dev

## Executive summary

AINode is a free, open-source local AI platform that turns any Linux host with an NVIDIA GPU into a complete inference + fine-tuning stack. It installs with one command, ships as a single container image, and clusters automatically: drop AINode on two or more NVIDIA DGX Sparks on the same subnet, and it shards one large model across every GPU they collectively hold.

The v0.4.0 release (April 2026) delivers the first real tensor-parallel distributed inference verified on NVIDIA GB10 hardware, with NCCL running over RoCE on ConnectX-7 at 200 Gbps and 244 GB of aggregated VRAM across two nodes.

## The problem AINode solves

Enterprise GPU hardware (NVIDIA DGX Spark, ASUS GX10, Dell AI Factory, HP AI Workstations) is increasingly accessible — you can buy a 128 GB unified-memory GB10 machine for $2,999 — but the software stack to actually serve production-grade AI on that hardware is a month-long engineering project:

- vLLM requires CUDA, Python venvs, precise version pinning, and source builds for new chips
- Ray for multi-node inference requires hand-rolled SSH bootstrapping, port planning, and placement groups
- NCCL on GB10 required patching against a three-node ring bug (fixed upstream, but pinned in eugr/spark-vllm-docker)
- Cross-node fabrics (RoCE, InfiniBand, Tailscale) each have their own NIC selection, MTU, and kernel-module constraints
- A browser chat UI, an OpenAI-compatible API, model download management, and fine-tuning are five separate codebases for most teams

Cloud AI services (OpenAI, Anthropic, Google) sidestep all of that, but they replace engineering with recurring cost ($100 to $10,000+/month), send your data off your network, impose rate limits, and leave you internet-dependent.

AINode eliminates both ends of that trade: one container, one command, and the hardware you already own becomes a complete private AI platform.

## How AINode works

### Installation

```
curl -fsSL https://ainode.dev/install | bash
```

The installer:
1. Sanity-checks Linux + Docker + NVIDIA Container Toolkit
2. Pulls `ghcr.io/getainode/ainode:<version>` (~18 GB, one-time)
3. (Optional, if `AINODE_PEERS` is set) bootstraps passwordless SSH to peer nodes
4. Writes `/etc/systemd/system/ainode.service` (a systemd unit whose ExecStart is `docker run ... ainode`)
5. Installs `/usr/local/bin/ainode` — a host wrapper that forwards `ainode <cmd>` into the running container, and runs `ainode update` directly on the host

### Runtime architecture

One container per node holds the entire AINode stack:

- `ainode/api/` — aiohttp API server (OpenAI-compatible `/v1` + internal cluster endpoints)
- `ainode/engine/docker_engine.py` — solo-mode spawns `vllm serve`; distributed-mode delegates to eugr's `launch-cluster.sh`
- `ainode/web/` — browser chat UI, model downloads, training UI, cluster config
- `ainode/discovery/` — UDP broadcast announcer + listener on port 5679
- `ainode/cli/` — `ainode start`, `ainode status`, `ainode service ...`, `ainode update`
- `ainode/service/systemd.py` — unit file renderer
- `ainode/training/` — LoRA and DDP fine-tuning

Distributed mode (`distributed_mode == "head"`) shells out to `/opt/spark-vllm-docker/launch-cluster.sh` inside the container. That launcher, from eugr/spark-vllm-docker, SSHes into each peer, forms Ray head/worker, and invokes vLLM with TP = (1 + number of peers).

### Discovery and clustering

Every running AINode periodically broadcasts a `NodeAnnouncement` over UDP on port 5679 with its node ID, model, API URL, distributed mode, and advertised peer IPs. Every AINode also listens and dedupes peers by node ID and source IP. The chat UI surfaces the live cluster membership; a head node can flip to distributed mode with one click once enough peers are visible.

Discovery is deliberately subnet-scoped (UDP broadcast) — there is no central registry, no etcd, no control plane. A cluster of N Sparks on the same 10 GbE or 200 Gbps fabric finds each other in under a second.

### Cross-node transport

NCCL (the NVIDIA Collective Communications Library) is what moves tensors between GPUs during sharded inference. AINode v0.4.0 prefers RoCE (RDMA over Converged Ethernet) when ConnectX-7 + `mlx5` drivers are present, which delivers 200 Gbps with kernel bypass. The `NCCL_SOCKET_IFNAME`, `GLOO_SOCKET_IFNAME`, `UCX_NET_DEVICES`, and `NCCL_IB_HCA` environment variables are set automatically from the host's visible HCAs; the UI's cluster-config panel exposes the chosen interface.

When RoCE is not available, NCCL falls back to socket transport. This works but is roughly 10x slower — token latency on a 70B model can go from ~30ms/token to ~300ms/token. For home-lab setups without ConnectX-7, AINode runs and serves, just slower.

### Storage

Models, logs, datasets, and fine-tune artifacts live on the host at `~/.ainode/`, mounted into the container at `/root/.ainode`. This means:
- `docker pull` + restart never touches user data
- Across cluster members, models can be shared via a NFS-on-top-of-ext4 mount point
- The container is stateless

## Feature reference

### Chat UI (port 3000)

- Streaming responses, server-sent events
- Multi-turn conversations with history persisted to localStorage
- Model switcher (loads whichever model is currently served by the engine)
- System prompt editor
- Works on desktop + mobile browsers
- No authentication by default; enable with `ainode auth enable`

### API (port 8000)

- `POST /v1/chat/completions` — OpenAI-compatible
- `POST /v1/completions` — OpenAI-compatible
- `POST /v1/embeddings` — OpenAI-compatible
- `GET /v1/models` — lists currently served model
- Internal cluster routes: `/api/nodes`, `/api/cluster/resources`, `/api/engine/start`

Latency numbers (warmed, single-node DGX Spark, 70B-class model):
- Time to first token: 80-200 ms
- Steady-state throughput: 30-50 tokens/sec (depends on model + batch)

Compare cloud: 200-2000 ms TTFT, per-token cost, rate limits.

### Fine-tuning

Browser panel supports:
- LoRA fine-tune (default, lightweight)
- Full fine-tune (GPU-memory permitting)
- Dataset upload (JSONL, Hugging Face datasets)
- Hyperparameter editor (learning rate, batch size, epochs, LoRA rank/alpha)
- Live loss curves
- Eval on a held-out split
- Artifacts saved to `~/.ainode/training/<run-id>/`

### Distributed inference

- Tensor-parallel (TP) — shards each layer's weight tensor across N GPUs. Best for models that fit in aggregate VRAM but not one GPU.
- Pipeline-parallel (PP) — shards by layer groups. Higher throughput per request at the cost of more batching complexity.
- Ray head auto-starts on head node; workers start in peer containers via SSH
- NCCL forms a cross-node ring; placement groups pin workers to specific nodes

## Supported models

AINode supports any vLLM-compatible model. Curated defaults in the UI:

- Meta Llama 3.1 (8B, 70B, 405B)
- Mistral 7B, Mixtral 8x7B
- Google Gemma 2 (2B, 9B, 27B)
- Qwen 2.5 (0.5B through 72B)
- DeepSeek V3
- Phi-3 mini / medium
- Command R / Command R+
- Any Hugging Face repo ID (paste it into the download UI)

For models that don't fit one GPU, the UI flips to "distributed" mode when enough peers are discovered — a 70B model shards TP=2 across two DGX Sparks; a 405B model in principle shards TP=4 across four.

## Hardware compatibility

| Hardware | Starting price | GPU memory | Status |
|----------|---------------|------------|--------|
| NVIDIA DGX Spark | $2,999 | 128 GB unified (GB10) | Verified v0.4.0 TP=2 |
| ASUS GX10 | $3,499 | 128 GB unified (GB10) | Verified |
| Dell AI Factory | $4,999+ | Varies (H100, H200, etc.) | Supported |
| HP AI Workstations | $3,999+ | Varies | Supported |
| Custom Linux + NVIDIA GPU | Varies | 8 GB minimum | Supported |

CUDA 13 drivers are required. AINode is tested on Ubuntu 22.04; other modern Linux distributions should work with minor adaptations.

## Networking requirements

For single-node: nothing special. `docker run --gpus all` + port 3000.

For distributed:
- **Passwordless SSH** between head and peers (bootstrapped by installer via `AINODE_PEERS` or `--setup-ssh`)
- **A shared L2 subnet** reachable via one NIC per host (multi-NIC routing ambiguity causes Ray placement-group hangs)
- **Ideally, RoCE-capable fabric** — ConnectX-7 at 200 Gbps, MTU 9000. Without it, NCCL falls back to sockets (~10x slower).
- **UDP port 5679** open on the cluster subnet for discovery
- **TCP ports 6379 (Ray head), 10001-10009 (Ray workers), 29500 (NCCL)** — auto-selected
- **Tailscale** can carry management SSH, but NCCL should go over the direct fabric (Tailscale adds ~2 ms RTT)

## Release registries

v0.4.0 is published on two container registries with identical images:

- **GHCR (canonical)** — `ghcr.io/getainode/ainode:0.4.0`, `ghcr.io/getainode/ainode:latest`. This is what the installer pulls. No rate limits for anonymous pulls.
- **Docker Hub (mirror)** — `argentaios/ainode:0.4.0`, `argentaios/ainode:latest`. Public mirror, for discoverability.

The base image (eugr/spark-vllm-docker plus NCCL patch) is at `ghcr.io/getainode/ainode-base:c026c92` (also pinned as `0.4.0-c026c92`).

## License and publisher

- License: Apache 2.0 (full text: https://www.apache.org/licenses/LICENSE-2.0)
- Publisher: Argentos AI (https://argentos.ai)
- Source of truth: https://github.com/getainode/ainode
- Powered by argentos.ai (footer in every UI)

## Upgrade path

On any installed host:

```
ainode update
```

Which runs `docker pull ghcr.io/getainode/ainode:latest && systemctl restart ainode`. Models, logs, datasets, and fine-tunes live on the host and are untouched by upgrades.

To pin a specific version:

```
AINODE_IMAGE=ghcr.io/getainode/ainode:0.4.1 ainode update
```

To roll back, set `AINODE_IMAGE` to a previous tag and rerun.

## Project status, April 2026

- v0.4.0 shipped with container-native distribution
- Two-node TP=2 distributed inference verified on real GB10 hardware
- Three-node topologies pending fabric tuning work (a NCCL 3-node ring bug was fixed upstream, pinned in the eugr base)
- Fine-tuning UI is feature-complete for LoRA; full-fine-tune is functional but UX is pending polish
- Marketing site (https://ainode.dev) is live with install CTA, product screenshots, and cluster demo image
- Container image is public on GHCR + Docker Hub

## Design principles

- **One container, everything inside.** No host Python venv, no source builds. `docker pull` is the install; `docker pull + systemctl restart` is the upgrade.
- **No external control plane.** UDP discovery on the local subnet, no registry, no etcd, no cloud callbacks.
- **Fail loudly, not silently.** If NCCL can't use RoCE, the logs say so. If a peer isn't reachable, the UI says so. If the disk is full, the engine refuses to start. The system tells you the truth.
- **Honest UI.** The cluster view reflects actual cluster state, not wishful thinking. Distributed means distributed; solo means solo.
- **Apache 2.0, forever.** No dual-license, no enterprise tier.

## Further reading

- Architecture, state-of-play, and lessons learned: https://github.com/getainode/ainode#readme
- Runbooks (ops/runbooks/): release-flow.md, dev-workflow.md, docker-engine-deploy.md, 2026-04-14-sharding-lessons-learned.md
- Release notes: https://github.com/getainode/ainode/releases
- Full docs: https://docs.argentos.ai