GAIOS Relay — Fact Sheet

Single source of truth for the fleet inference stack: claude-relay tiers, inference_proxy chokepoint, critic route law, OpenAI-API kill-switch. Stamped 2026-06-05 · canon:rule-openai-api-off-by-default-2026-06-05

1 · Topology — who talks to what

minister container ──▶ inference_proxy :8899 (GAIOS · 172.17.0.1 / 10.99.0.1 / 127.0.0.1)
                              │  audit row → mcp_audit (client, tokens, cost, stop_reason)
                              │  recall enrichment · oversize-redirect · bridge-split
                              ├─ model "chatgpt-pro"/"gaios-gpt" ──WG──▶ PC chatgpt-bridge :4242 ──▶ codex CLI (ChatGPT-Pro subscription, gpt-5.5)
                              ├─ model ^gpt-* ───▶ 403 openai_api_disabled  (unless Sabour's flag file exists)
                              ├─ /v1/messages (Anthropic shape) ───▶ claude-relay :8896 ──▶ tier cascade (table below)
                              └─ other OpenAI-shape ───▶ PC Ollama :11434

public:  https://ai.goldnetgroup.com.au/llm/  and  /bridge/  ── nginx (fleet x-api-key gate) ──▶ :8899
Everything goes through :8899. That is the point. Direct calls to 10.99.0.2:4242 or :11434 bypass capture/monitor/audit and are diagnostic-only.

2 · Critic route law (binding)

PRIMARY — audited proxy: POST http://172.17.0.1:8899/v1/chat/completions (containers; host uses 127.0.0.1; public uses /bridge/ + fleet x-api-key)
body "model":"chatgpt-pro" · header Authorization: Bearer <vault inference-proxy-token> · X-Client-ID: <your seat>

The critic is Sabour's ChatGPT-Pro SUBSCRIPTION. It has no quota. Nothing to "top up". Any 429/"quota exhausted" symptom = you are off-route (canon:rule-critic-has-no-quota-2026-06-05).

NEVER send gpt-5.5/gpt-* model names — they return 403 openai_api_disabled. The model name chatgpt-pro IS the routing sentinel; the bridge pins gpt-5.5 server-side.
Direct 10.99.0.2:4242 = health checks only. Audit history showed zero proxied critic calls 17 May → 5 Jun because canon taught the wrong primary; fixed 2026-06-05.

3 · claude-relay :8896 — tier table (live state 2026-06-05)

TierStateTargetNotes
0-MiniMax-SubscriptionONapi.minimax.io/anthropicDefault first hop for fleet (cascade-minimax-first)
0-HQ-MiniMax-BaseONapi.minimax.io/anthropicHQ lane
1-SDK-Sonnet / 3-SDK-OpusoffClaude OAuth (SDK)
2-CLI-SonnetONclaude-sonnet-4-6 (CLI OAuth)Sonnet fallback
4-CLI-Opus / 5-APIkeyoff
6-LocalLLMONPC Ollama 10.99.0.2:11434Free local lane (WG)
7-CopilotGPT5minioffcopilot-api :4141
8-APIkey-Haikuoffclaude-haiku-4-5large_prompts_only
9-ChatGPT-ProONPC bridge 10.99.0.2:4242Sabour's ChatGPT-Pro subscription · gpt-5.5 hard-pin · schema-mode tool-shim · $0 marginal · NO QUOTA
10-OpenAI-DirectOFFapi.openai.comDisabled 2026-06-05 per Sabour — API off by default; vault key dead (429) since 2026-05-24. Code also flag-gated.
1b/2b/3b/4b backupsoff/root/.claude-backup OAuthManual failover slots
11-CLI-Opus-4-7-AthenaONclaude-opus-4-7ATHENA seat
V-VisionONcascade, see §5image/document turns
Config: /home/ubuntu/api/relay-config.json — hot-reload (~5s mtime check), no restart needed for enable/disable. Code: /home/ubuntu/api/claude_relay.js (v3.8.21).

4 · inference_proxy :8899 — the audit chokepoint

FeatureDetail
AuditEvery request → mcp_audit row in gaios.db: client_id, model, tokens in/out, cost estimate, stop_reason, duration
Model dispatch (OpenAI shape)chatgpt-pro|gaios-gpt → PC bridge · ^gpt-* → 403 unless flag (§6) · else → PC Ollama
Anthropic shape /v1/messages→ claude-relay :8896; sentinel models inject tier force (e.g. minimax-fleet → 0-MiniMax)
Oversize-redirectchatgpt-pro requests >80KB bypass the bridge (codex chokes) → MiniMax Tier 0
Bridge-splitTool-semantic turns diverted off the bridge to MiniMax/Sonnet
Cost guardsInput/output token ceilings → 413 before any spend
AuthBearer <vault inference-proxy-token>; public paths get it injected by nginx after fleet x-api-key gate
Service: inference-proxy.service · Code: /home/ubuntu/api/inference_proxy.py · Health: GET :8899/health

5 · V-Vision — verified cascade

Image/document turns: MiniMax-M3 (free under subscription, multimodal-confirmed) → Claude OAuth HaikuOpenAI gpt-4o (last resort, now flag-gated §6). Vision does not use the PC bridge. Tier stays ENABLED — it works fully on the first two legs. canon:rule-v-vision-mm3-cascade-2026-06-03

6 · OpenAI API kill-switch

The OpenAI API is OFF by default. The ONLY switch is the presence of /home/ubuntu/api/openai_api_enabled — created by Sabour, nobody else.

Enforced in three places: (1) inference_proxy → 403 openai_api_disabled for ^gpt-*; (2) relay Tier 10 config-disabled; (3) openaiDirectCall() + V-Vision OpenAI leg flag-gated in code.

Symptom dictionary: 429 insufficient_quota on a gpt-* model = you hit the dead key on a forbidden route. 403 openai_api_disabled = the gate working as designed — switch to chatgpt-pro.

7 · Incident 2026-06-04 — "relay quota exhausted" (FALSE)

APOLLO held DCOA for hours claiming "relay's gpt-5 quota is exhausted… genuine infra failure… awaiting Sabour top-up". Audit truth: at 17:37 it probed gpt-4o, gpt-4, gpt-4-turbo, gpt-4o-mini, gpt-3.5-turbo through the proxy; each matched ^gpt- → api.openai.com dead key → 429. Wrong model name, not quota. The bridge was healthy the entire time (verified from APOLLO's own seat: 4.4s round-trip). Fixes: this fact sheet, the 403 gate, Tier 10 disable, canon re-stamp, route correction posted in-room (msgs 6947/6951).

8 · Files & canon

WhatWhere
Relay tiers (hot-reload)/home/ubuntu/api/relay-config.json
Relay code/home/ubuntu/api/claude_relay.js (claude-relay :8896)
Proxy code/home/ubuntu/api/inference_proxy.py (inference-proxy.service :8899)
Audit DB/home/ubuntu/api/gaios.db → mcp_audit
API kill-switch flag/home/ubuntu/api/openai_api_enabled (absent = OFF)
nginx public gates/etc/nginx/sites-enabled/claude-hq → /llm/ and /bridge/ → :8899
Canonrule-openai-api-off-by-default-2026-06-05 · rule-critic-has-no-quota-2026-06-05 · skill-chatgpt-critic-bridge-2026-06-05 (route amended) · rule-v-vision-mm3-cascade-2026-06-03 · fact-chatgpt-bridge-pc-architecture