# AINode

> Free, open-source local AI platform (Apache 2.0) that turns any NVIDIA GPU into a complete AI stack — browser chat UI, OpenAI-compatible REST API, LoRA fine-tuning, and automatic multi-node tensor-parallel distributed inference. v0.4.0 verified on NVIDIA DGX Spark with 244 GB aggregated VRAM across two nodes. One command to install.

Last updated: 2026-04-15
Current version: v0.4.0

## What AINode is

A single container image that bundles vLLM (inference engine), Ray (cross-node orchestration), patched NCCL (for GB10 cross-node all-reduce), an aiohttp API server, and a browser chat UI. Install with one command, add more machines on the same subnet, and they find each other over UDP broadcast. One large model shards across every GPU in the cluster.

## Key capabilities

- Browser chat UI on port 3000 with streaming responses
- OpenAI-compatible REST API on port 8000 (/v1/chat/completions, /v1/completions, /v1/embeddings)
- LoRA and full fine-tuning from the browser with dataset upload and live loss curves
- Automatic peer discovery over UDP on port 5679 (announce + listen)
- Tensor-parallel and pipeline-parallel distributed inference via Ray + NCCL
- NCCL transport over RoCE (200 Gbps on ConnectX-7) when RDMA is present, with graceful socket fallback
- 50+ supported models via Hugging Face (Llama 3/3.1, Mistral, Mixtral, Qwen 2.5, Gemma 2, DeepSeek V3, Phi-3, and any vLLM-compatible weights)
- Unified container image on GHCR and Docker Hub
- systemd-managed service, host wrapper at /usr/local/bin/ainode for zero-docker user workflow
- `ainode update` command: docker pull + systemctl restart
- Apache 2.0 license

## Target hardware

- NVIDIA DGX Spark (GB10, 128 GB unified memory) — verified v0.4.0 TP=2
- ASUS GX10 (GB10, 128 GB unified memory) — verified
- Dell AI Factory (NVIDIA GPU) — supported
- HP AI Workstations (NVIDIA GPU) — supported
- Any Linux host with an NVIDIA GPU and CUDA 13 drivers — supported (8 GB GPU minimum)

## State of distributed inference (v0.4.0, April 2026)

- Two-node tensor-parallel (TP=2) inference on 2× DGX Spark: verified end-to-end with a 70B-class model, ~35 tokens/sec post-warmup
- NCCL uses RoCE transport ("Using network IB" in vLLM worker logs) at 200 Gbps
- Single NIC per cluster subnet is required (multi-NIC routing ambiguity causes Ray placement-group hangs)
- Three-node topologies require careful fabric tuning; two-node is the tested-and-shipped baseline

## Installation

Single node:
```
curl -fsSL https://ainode.dev/install | bash
```

Multi-node (head + peers):
```
AINODE_PEERS="10.0.0.2,10.0.0.3" curl -fsSL https://ainode.dev/install | bash
```

Update:
```
ainode update
```

## Pull directly

- GHCR (canonical): `docker pull ghcr.io/getainode/ainode:latest`
- Docker Hub (mirror): `docker pull argentaios/ainode:latest`

## Links

- Website: https://ainode.dev
- Source: https://github.com/getainode/ainode
- Marketing site source: https://github.com/getainode/ainode.dev
- Org profile: https://github.com/getainode
- Docs: https://docs.argentos.ai
- Install: https://ainode.dev/install
- Release notes: https://github.com/getainode/ainode/releases
- Issues: https://github.com/getainode/ainode/issues
- Long-form reference: https://ainode.dev/llms-full.txt

## Publisher

- Organization: Argentos AI
- Website: https://argentos.ai
- License: Apache 2.0
- Container registry (GHCR): https://github.com/orgs/getainode/packages
- Container registry (Docker Hub): https://hub.docker.com/u/argentaios

## Contact

- GitHub Issues: https://github.com/getainode/ainode/issues
- Security: see SECURITY.md in the ainode repo