T0 — Flagship · Dual RTX 5090 · 192 GB DDR5
T0
P6 White
i9-12900K · Win · llama.cpp · 1200W PSU
ONLINE
CPU
28% 72°C 190W
RTX 5090
82% 74°C 490W
VRAM
24.1 / 32 GB
RAM
88 / 192 GB
GPU 1840 rpm · PSU 680 / 1200W (57%)
Models
llama.cpp · 4 slotsphi-4:14b
Q6_K
12.8 GB⚡ 41 t/s
1 loaded · 1 generating · 12.8 GB VRAM · KV cache 42%
T0
Y70 White (Hyte)
9950X3D · Linux · Ollama · 1400W PSU
ONLINE
CPU
42% 61°C 210W
RTX 5090
69% 61°C 510W
VRAM
25.6 / 32 GB
RAM
48 / 192 GB
GPU 1620 rpm · PSU 720 / 1400W (51%)
Models
ollamaqwen2.5:72b
Q4_K_M
38.4 GB● idle
llama3.3:70b
Q4_K_M
9.2 GB⚡ 28 t/s
2 loaded · 1 generating · 47.6 GB VRAM · expires 8m
T1 — High RAM · 128 GB nodes
T1
GTR9 Pro
Ryzen 9 · 128 GB · Ollama · dual 10GbE
ONLINE
CPU
67% 52°C 185W
RAM
91 / 128 GB
2100 rpm · no eGPU · CPU inference
Models
ollamadeepseek-r1:32b
Q8_0
28.1 GB RAM● idle
1 loaded · 0 generating · 28.1 GB RAM offload
T1
EvoX2 (LexVault)
Ryzen AI Max · 128 GB · Ollama · Qdrant
ONLINE
CPU
31% 49°C 120W
RAM
70 / 128 GB
1640 rpm · LexVault 1390 contracts
Models
ollamaqwen2.5:14b
Q6_K
10.8 GB↓ loading
1 loading · 0 generating
T1
MS-S1 Max
Core Ultra · 128 GB · Ollama · dual 10GbE
ONLINE
CPU
24% 44°C 95W
RAM
49 / 128 GB
1200 rpm
Models
ollamano models loaded
● idle
0 loaded
T2 — Specialists · eGPU / Arc / Apple
T2
X7 Ti
Core Ultra 32GB · DEG1 Oculink · llama.cpp
ONLINE
CPU
19% 44°C 65W
RTX 3080 Ti
44% 58°C 280W
VRAM
5.8 / 12 GB
RAM
13 / 32 GB
1980 rpm
Models
llama.cppmistral-nemo:12b
Q5_K_M
5.2 GB● idle
1 loaded · 5.2 GB VRAM
T2
SER9 Pro
Ryzen AI 80TOPS NPU · AG03 USB4 · Ollama
ONLINE
CPU / NPU
33% 48°C 95W
RTX 2080 Ti
38% 54°C 220W
VRAM
5.9 / 11 GB
1750 rpm · AG03 850W dock
Models
ollamano models loaded
● idle
0 loaded · runtime reachable
T2
Crystal Arc
i7-6700K · B580 + A580 dual QSV · llama.cpp
ONLINE
CPU
11% 38°C 65W
Arc B580
31% 48°C 85W
Arc A580
18% 44°C 55W
RAM
18 / 64 GB
1540 rpm · dual QSV encode node
Models
llama.cppsmollm2:1.7b
Q8_0
1.8 GB● idle
1 loaded · 1.8 GB VRAM
T3 / Gateway — CPU-only nodes
T3
Borgcube
i5-12400 · Arc B580 · Ollama · embed
ONLINE
CPU
23% 41°C 95W
Arc B580
12% 39°C 60W
RAM
19 / 32 GB
900 rpm
Models
ollamanomic-embed-text
F16
0.8 GB● idle
1 loaded · 0.8 GB VRAM · embed service
GW
MS-01 (LiteLLM)
i9 · Gateway · dual 10GbE · Docker
OFFLINE
Last seen 4m 12s ago · all metrics stale · LiteLLM gateway unreachable
| Node | GPU | Util % | VRAM Used / Total | Temp | Power | Fan | Loaded Models | Active |
|---|---|---|---|---|---|---|---|---|
| P6 White | RTX 5090 | 74°C | 490W | 1840 rpm | phi-4:14b | ⚡ 1 slot | ||
| Y70 White | RTX 5090 | 61°C | 510W | 1620 rpm | qwen2.5:72b +1 | ⚡ 1 slot | ||
| X7 Ti | RTX 3080 Ti | 58°C | 280W | 1980 rpm | mistral-nemo:12b | ● idle | ||
| SER9 Pro | RTX 2080 Ti | 54°C | 220W | 1750 rpm | — | ● idle | ||
| Crystal Arc | Arc B580 | 48°C | 85W | 1540 rpm | smollm2:1.7b | ● idle | ||
| Crystal Arc | Arc A580 | 44°C | 55W | 1240 rpm | — | ● idle | ||
| Borgcube | Arc B580 | 39°C | 60W | 900 rpm | nomic-embed | ● idle | ||
| Mac Mini M4 | M4 Pro GPU | 42°C | 28W | — | — | ● idle |
Cluster VRAM consumed by models — 107 GB total
60.4 GB used · 46.6 GB free · 57%
Cluster RAM offloaded by models (CPU inference)
28.1 GB offloaded · deepseek-r1 on GTR9 · 11% of fleet RAM
| Model ↕ | Quant ↕ | Node ↕ | VRAM GB ↕ | RAM Off. ↕ | State ↕ | t/s ↕ |
|---|---|---|---|---|---|---|
| phi-4:14b | Q6_K | P6 White | 12.8 | — | ⚡ gen | 41 |
| llama3.3:70b | Q4_K_M | Y70 White | 9.2 | — | ⚡ gen | 28 |
| qwen2.5:72b | Q4_K_M | Y70 White | 38.4 | — | ● idle | — |
| deepseek-r1:32b | Q8_0 | GTR9 Pro | — | 28.1 GB | ● idle | — |
| qwen2.5:14b | Q6_K | EvoX2 | 10.8 | — | ↓ loading | — |
| mistral-nemo:12b | Q5_K_M | X7 Ti | 5.2 | — | ● idle | — |
| smollm2:1.7b | Q8_0 | Crystal Arc | 1.8 | — | ● idle | — |
| nomic-embed-text | F16 | Borgcube | 0.8 | — | ● idle | — |
Pi Fleet — khetiai-pi-01 through khetiai-pi-20 · Tailscale 10.0.0.101–120