Cluster CPU
34%
~480 cores
GPU Avg Util
61%
8 GPUs active
VRAM Used
60.4GB
of 107 GB
Total Draw
2.8kW
12 nodes
Hottest
74°C
P6 · RTX 5090
Models Loaded
7
across fleet
Active Gen.
2
inferencing
Nodes Online
12/14
2 offline
Tailscale
32/34
peers up
T0 — Flagship · Dual RTX 5090 · 192 GB DDR5
T0
P6 White
i9-12900K · Win · llama.cpp · 1200W PSU
ONLINE
CPU
28% 72°C 190W
RTX 5090
82% 74°C 490W
VRAM
24.1 / 32 GB
RAM
88 / 192 GB
GPU 1840 rpm · PSU 680 / 1200W (57%)
Models
llama.cpp · 4 slots
phi-4:14b
Q6_K
12.8 GB⚡ 41 t/s
1 loaded · 1 generating · 12.8 GB VRAM · KV cache 42%
T0
Y70 White (Hyte)
9950X3D · Linux · Ollama · 1400W PSU
ONLINE
CPU
42% 61°C 210W
RTX 5090
69% 61°C 510W
VRAM
25.6 / 32 GB
RAM
48 / 192 GB
GPU 1620 rpm · PSU 720 / 1400W (51%)
Models
ollama
qwen2.5:72b
Q4_K_M
38.4 GB● idle
llama3.3:70b
Q4_K_M
9.2 GB⚡ 28 t/s
2 loaded · 1 generating · 47.6 GB VRAM · expires 8m
T1 — High RAM · 128 GB nodes
T1
GTR9 Pro
Ryzen 9 · 128 GB · Ollama · dual 10GbE
ONLINE
CPU
67% 52°C 185W
RAM
91 / 128 GB
2100 rpm · no eGPU · CPU inference
Models
ollama
deepseek-r1:32b
Q8_0
28.1 GB RAM● idle
1 loaded · 0 generating · 28.1 GB RAM offload
T1
EvoX2 (LexVault)
Ryzen AI Max · 128 GB · Ollama · Qdrant
ONLINE
CPU
31% 49°C 120W
RAM
70 / 128 GB
1640 rpm · LexVault 1390 contracts
Models
ollama
qwen2.5:14b
Q6_K
10.8 GB↓ loading
1 loading · 0 generating
T1
MS-S1 Max
Core Ultra · 128 GB · Ollama · dual 10GbE
ONLINE
CPU
24% 44°C 95W
RAM
49 / 128 GB
1200 rpm
Models
ollama
no models loaded
● idle
0 loaded
T2 — Specialists · eGPU / Arc / Apple
T2
X7 Ti
Core Ultra 32GB · DEG1 Oculink · llama.cpp
ONLINE
CPU
19% 44°C 65W
RTX 3080 Ti
44% 58°C 280W
VRAM
5.8 / 12 GB
RAM
13 / 32 GB
1980 rpm
Models
llama.cpp
mistral-nemo:12b
Q5_K_M
5.2 GB● idle
1 loaded · 5.2 GB VRAM
T2
SER9 Pro
Ryzen AI 80TOPS NPU · AG03 USB4 · Ollama
ONLINE
CPU / NPU
33% 48°C 95W
RTX 2080 Ti
38% 54°C 220W
VRAM
5.9 / 11 GB
1750 rpm · AG03 850W dock
Models
ollama
no models loaded
● idle
0 loaded · runtime reachable
T2
Crystal Arc
i7-6700K · B580 + A580 dual QSV · llama.cpp
ONLINE
CPU
11% 38°C 65W
Arc B580
31% 48°C 85W
Arc A580
18% 44°C 55W
RAM
18 / 64 GB
1540 rpm · dual QSV encode node
Models
llama.cpp
smollm2:1.7b
Q8_0
1.8 GB● idle
1 loaded · 1.8 GB VRAM
T3 / Gateway — CPU-only nodes
T3
Borgcube
i5-12400 · Arc B580 · Ollama · embed
ONLINE
CPU
23% 41°C 95W
Arc B580
12% 39°C 60W
RAM
19 / 32 GB
900 rpm
Models
ollama
nomic-embed-text
F16
0.8 GB● idle
1 loaded · 0.8 GB VRAM · embed service
GW
MS-01 (LiteLLM)
i9 · Gateway · dual 10GbE · Docker
OFFLINE
Last seen 4m 12s ago · all metrics stale · LiteLLM gateway unreachable
NodeGPUUtil %VRAM Used / Total TempPowerFanLoaded ModelsActive
P6 White RTX 5090
82%
24.1 / 32 GB
74°C490W1840 rpm phi-4:14b ⚡ 1 slot
Y70 White RTX 5090
69%
25.6 / 32 GB
61°C510W1620 rpm qwen2.5:72b +1 ⚡ 1 slot
X7 Ti RTX 3080 Ti
44%
5.8 / 12 GB
58°C280W1980 rpm mistral-nemo:12b ● idle
SER9 Pro RTX 2080 Ti
38%
5.9 / 11 GB
54°C220W1750 rpm ● idle
Crystal Arc Arc B580
31%
1.8 / 12 GB
48°C85W1540 rpm smollm2:1.7b ● idle
Crystal Arc Arc A580
18%
1.7 / 8 GB
44°C55W1240 rpm ● idle
Borgcube Arc B580
12%
0.8 / 12 GB
39°C60W900 rpm nomic-embed ● idle
Mac Mini M4 M4 Pro GPU
22%
Unified
42°C28W ● idle
Cluster VRAM consumed by models — 107 GB total
60.4 GB used  ·  46.6 GB free  ·  57%
Cluster RAM offloaded by models (CPU inference)
28.1 GB offloaded  ·  deepseek-r1 on GTR9  ·  11% of fleet RAM
Model ↕ Quant ↕ Node ↕ VRAM GB ↕ RAM Off. ↕ State ↕ t/s ↕
phi-4:14bQ6_KP6 White 12.8 ⚡ gen41
llama3.3:70bQ4_K_MY70 White 9.2 ⚡ gen28
qwen2.5:72bQ4_K_MY70 White 38.4 ● idle
deepseek-r1:32bQ8_0GTR9 Pro 28.1 GB ● idle
qwen2.5:14bQ6_KEvoX2 10.8 ↓ loading
mistral-nemo:12bQ5_K_MX7 Ti 5.2 ● idle
smollm2:1.7bQ8_0Crystal Arc 1.8 ● idle
nomic-embed-textF16Borgcube 0.8 ● idle
Pi Fleet — khetiai-pi-01 through khetiai-pi-20 · Tailscale 10.0.0.101–120