Architecture / Deployment

From DGX to Licensed Product

A complete architecture plan for deploying the Decision Intelligence Platform as a production-grade, multi-tenant licensed product — covering platform options, required components, time estimates, and the recommended path.

What exists today

The platform is a fully operational local-first decision intelligence pipeline running on the DGX. It is not a prototype — it has processed 68 real cases, has 84,112 indexed chunks from 190 trusted books, a live Cloudflare Worker interface, queue-based job processing, and a master strategist evaluation layer.

Compute

NVIDIA DGX

Local inference via Ollama. qwen2.5:3b embeddings (2,048 dims), qwen2.5:14b option generation. 2.5GB embedding index. All private, zero API cost for retrieval.

Orchestration

OpenClaw + Systemd

Queue watcher service, cron jobs, embedding watchdog. Event-driven via Cloudflare KV triggers. Fallback model chain: HTTP gateway → Claude CLI → OpenAI → Anthropic API.

Interface

Cloudflare Pages + Worker

Static frontend at visuals.professionalopinions.com.au. Worker handles challenge submission, KV writes, email notifications, Attio CRM upsert. Browser polls for results.

Synthesis

Anthropic Sonnet

Final synthesis via Claude CLI with OAuth auto-refresh. GDELT live context injection. Source-backed output with attribution tracking and compensation ledger.

Where can it run?

Viable with caveats
Option B

Hybrid — DGX + Cloud wrapper

DGX handles heavy inference. AWS handles customer-facing layer: auth, billing, dashboards, API routing. Connected via Tailscale or VPN.

Strengths
  • Lowest inference cost — DGX already paid for
  • Maximum GPU performance for local models
  • Fast to stand up
Tradeoffs
  • DGX is a single point of failure
  • Box goes down = all customers affected
  • Not viable for mission-critical SLAs without a standby
  • Harder to scale beyond one concurrent heavy job
Operational complexity
Option C

Co-located DGX

Move the DGX into a data centre with proper uptime SLAs, redundant power, and high bandwidth. Wrap with cloud services for customer layer.

Strengths
  • Keeps GPU performance, eliminates home-lab risk
  • Colocation cost ~$500–1,500/month vs $3,000+/month cloud GPU
  • Dedicated hardware — no noisy neighbours
Tradeoffs
  • Physical logistics — shipping, racking, power
  • Harder to scale horizontally
  • Remote access dependent on data centre network
Emerging
Option D

NVIDIA DGX Cloud

NVIDIA's managed DGX-as-a-service. Pay-per-hour GPU access. Keeps the workload native to DGX architecture without owning hardware.

Strengths
  • No hardware ownership risk
  • Native DGX performance
  • Scales elastically
Tradeoffs
  • Early stage — limited regions
  • Expensive at scale
  • Vendor lock-in to NVIDIA platform

Production architecture (recommended path)

CUSTOMER BROWSER │ ▼ CLOUDFLARE EDGE (Pages + Workers) │ Static frontend · Challenge submission · Result polling │ Per-customer auth token validation │ ▼ AWS API GATEWAY ← JWT auth ← Customer identity │ ├──▶ ECS / EKS CLUSTER │ │ │ ├── API service (FastAPI) ← handles job intake, result delivery │ ├── Queue processor workers ← reads SQS, runs pipeline │ ├── Enrollment sync service ← KV → users.jsonl │ └── Admin service (internal only) ← manage customers, toggle features │ ├──▶ AWS SQS ← durable job queue (survives worker restarts) │ ├──▶ AWS RDS (Postgres) ← customer accounts, billing, usage, attribution │ ├──▶ AWS SECRETS MANAGER ← per-customer Anthropic API keys (encrypted at rest) │ └──▶ INFERENCE LAYER │ ├── DGX (local / colo) │ Ollama · qwen2.5:3b embeddings · qwen2.5:14b options │ LanceDB / JSONL index · 84,112 chunks │ └── AWS GPU fallback (g4dn.xlarge / p3) Triggered only if DGX unavailable GDELT ─────────────────────────────────▶ live context injection (per-case) ANTHROPIC API ────────────────────────▶ Sonnet synthesis (customer's own key) STRIPE ──────────────────────────────▶ subscription billing + usage metering TAILSCALE ────────────────────────────▶ remote access to DGX + all nodes
Multi-tenancy

Customer isolation

Each customer gets their own account, isolated data namespace, and encrypted Anthropic API key stored in Secrets Manager. Per-customer rate limiting and usage tracking from day one.

Interface

Unified command centre

Single web app surfacing all platform capabilities. Feature flags control which modules each licensed customer can access. Admin panel for you to manage customers, toggle features, inspect usage.

Token visibility

Cost dashboard

Real-time view: tokens consumed by model, by feature, by day. Pulls from Anthropic usage API plus local instrumentation. Budget alerts and cost projections per customer.

Stability

Self-healing layer

Circuit breakers on all external APIs — if Anthropic, GDELT, or any dependency is down, the pipeline degrades gracefully instead of crashing. Queue-based async so nothing dies on worker restart.

Reliability

Multi-AZ deployment

Two availability zones minimum. Health checks and auto-restart on ECS. Automated database backups with tested restore. PagerDuty or equivalent for on-call alerting.

Remote access

Operator tooling

Tailscale mesh for SSH from anywhere. AWS SSM Session Manager for browser-based shell — no open ports needed. Admin API for triggering restarts, draining queues, revoking customer keys.

Billing

Stripe integration

Subscription billing with optional usage-based components. Each customer brings their own Anthropic key — they own the model cost directly. Your license fee is separate from their API spend.

Recovery

Runbook + DR

Documented recovery procedures for every failure mode. Automated restore tests monthly. Configuration backups versioned in S3. No undocumented single points of failure.

Security

Customer data isolation

Row-level security in Postgres. API keys never logged. Network isolation per tenant where possible. Audit trail for all admin actions. TLS everywhere, no plaintext secrets in environment variables.

Option A — Solo build (you)
$0
labour cost
5–6 months

Your time is the cost. AWS setup and tooling during build: ~$50–100/mo (dev environment, test deployments). No contractor spend. Slowest path but highest control.

Option B — You + 1 engineer (contractor)
$40–80K
total build cost (AUD)
3–4 months

One strong DevOps/fullstack engineer at $150–250/hr (contract rate, AU market). 300–400 hours over 3–4 months. AWS dev environment: ~$200–400 total during build. Fastest credible path to launch.

Option C — MVP only (fastest to revenue)
$15–30K
total build cost (AUD)
6–8 weeks

Containerise + auth + one paying customer + manual billing. Skip dashboard, skip self-healing. ~100–150 contractor hours. Enough to validate willingness to pay before full investment.

Phase-by-phase breakdown

Contractor cost estimates assume AU market rate of $150–250/hr for a senior DevOps/fullstack engineer. Solo columns assume your time only (no cash outlay beyond AWS).

Phase Work Time Solo cost Contractor cost
1 · Containerisation Dockerise all pipeline services. Docker Compose for local dev. CI/CD pipeline (GitHub Actions → ECR → ECS). Smoke tests pass in containers before anything else. 2–3 wks $0 $5–10K
2 · Multi-tenancy Auth layer (JWT + API keys). Per-customer Anthropic key vault in Secrets Manager. Data isolation in Postgres. Per-customer rate limiting and usage tracking. 3–4 wks $0 $8–14K
3 · Unified interface Web app surfacing all capabilities behind auth. Feature flags per customer tier. Admin panel: manage customers, toggle features, inspect usage and attribution. 4–6 wks $0 $10–20K
4 · Cost dashboard Anthropic usage API + local instrumentation. Per-customer token and cost visibility. Budget alerts. Stripe subscription billing with usage-based components. 2–3 wks $0 $5–10K
5 · AWS deployment ECS on AWS, multi-AZ. Health checks, auto-restart. RDS Postgres. SQS job queue. Secrets Manager. Cloudflare stays as edge + interface layer. 3–4 wks $200–400 $8–14K
6 · Self-healing Circuit breakers on Anthropic, GDELT, and all external APIs. Queue-based async throughout. PagerDuty alerting. Automated DB backup and tested restore procedure. 2–3 wks $0 $5–10K
7 · Remote access Tailscale mesh across all nodes. AWS SSM Session Manager. Admin API (restart, drain, revoke). Runbook documentation for every failure mode. 1 week $0 $2–4K
8 · Hardening Load testing (10× peak). Security audit or external penetration test. Customer onboarding flow. Pilot with 2–3 known customers before public launch. 2–3 wks $500–2K $5–10K
Total 5–6 months ~$700–2,400 AUD ~$48–92K AUD
AWS infrastructure during build is modest: a dev ECS cluster, RDS dev instance, and S3 bucket cost ~$50–100/mo. Run it for 3–6 months during development = $150–600 total before launch. Not a material build cost.
Start with ECS, not Kubernetes. Kubernetes adds significant operational burden — cluster management, node pools, YAML sprawl. ECS (AWS Elastic Container Service) is production-grade, simpler, and handles self-healing containers, auto-scaling, and rolling deploys without the overhead. Migrate to EKS only if you genuinely hit problems ECS cannot solve.
Phase 1 priority: containerise first, cloud second. Get everything running in Docker locally before touching AWS. If it doesn't run cleanly in a container, it won't run cleanly in the cloud either. This also lets you test the full pipeline in isolation — embedding, retrieval, synthesis, queue — before the network and auth complexity arrive.
The DGX is an asset, not a liability — use it right. The embedding index (2.5GB, 84K chunks) is the core IP. Keep local inference on the DGX or co-located hardware. Cloud GPU instances are expensive and unnecessary for inference your hardware already handles. Use AWS for stateless API workers, auth, and databases only.
Customer brings their own Anthropic key — design around this early. This is the right model: they own the model cost, you own the platform. But it means your auth and secrets layer must handle key storage, rotation, and revocation cleanly from day one. Do not retrofit this — it touches every part of the pipeline.
Define "mission critical" before building for it. True mission-critical (99.9%+ uptime SLA) requires redundant DGX or a hot standby, multi-AZ everything, and tested failover. That is materially more expensive and complex than "highly available." For a v1 licensing product, "highly available with documented recovery" (99.5%) is likely sufficient and far cheaper to build.
Build observability before features. Before you add licensed modules or customer-facing features, instrument the pipeline: request latency, model call success/failure rates, queue depth, embedding hit rates, cost per case. You cannot debug or sell a system you cannot measure.

Where to start

Containerise the pipeline Week 1–3
  • Write Dockerfiles for each service: API, queue processor, enrollment sync, embedding watchdog
  • Docker Compose file that runs the full pipeline locally end-to-end
  • Verify: submit a challenge, watch it process, receive a result — all inside containers
  • CI pipeline: push to GitHub → build images → push to ECR
Add auth and multi-tenancy skeleton Week 3–6
  • Postgres schema: customers, api_keys, usage_events, jobs
  • JWT auth middleware in the API service
  • Secrets Manager integration — store and retrieve per-customer Anthropic keys
  • First real customer account created and a job processed under their key
Deploy to AWS Week 6–9
  • ECS cluster with the containerised services
  • RDS Postgres (multi-AZ), SQS queue, Secrets Manager
  • Cloudflare Worker updated to point at AWS API Gateway instead of DGX directly
  • Tailscale mesh connected — SSH access to all nodes from anywhere
Pilot with first customer Week 9–12
  • Onboard one known customer — they supply their Anthropic key, you supply access
  • Manual billing acceptable at this stage
  • Instrument everything: latency, cost per case, errors
  • Fix the top 5 issues before opening to more customers
Cost dashboard + Stripe Week 10–14
  • Token visibility per customer: by model, by feature, by day
  • Stripe subscriptions with usage-based metering if needed
  • Budget alerts so customers know before they overspend
Self-healing and hardening Week 12–18
  • Circuit breakers on all external API calls
  • Automated DB backup with monthly tested restore
  • Load testing at 10× expected peak
  • Runbook for every failure mode documented and tested

What it costs to run

Three scenarios: Pilot (1–5 customers, low volume), Growth (10–30 customers), Scale (50+ customers). All figures in USD/month unless noted. Anthropic API costs are not included — customers bring their own keys and pay Anthropic directly. Prices current as of April 2026.

Anthropic API — customer's own key

Each customer pays Anthropic directly. These are the costs your customers will incur per use of the platform. Understanding this is critical for positioning your license fee correctly.

ModelInputOutputTypical strategy case costNotes
Sonnet 4.6 (current) $3 / MTok $15 / MTok ~$0.30–0.80 Primary synthesis model. A typical case: ~30K input tokens (corpus + source pack + prompt), ~8K output. Cost ≈ $0.09 input + $0.12–0.40 output.
Opus 4.6 (premium) $5 / MTok $25 / MTok ~$0.75–2.00 Reserve for premium tier / complex cases. Same token volumes cost ~3× more than Sonnet.
Haiku 4.5 (routing/triage) $1 / MTok $5 / MTok ~$0.03–0.10 Use for lightweight tasks: intent classification, queue routing, quick checks. Not for synthesis.
Batch processing discount 50% off all models for async workloads. If customers can tolerate 30–60 min latency, batch mode halves their Anthropic bill. Significant for high-volume users.
Pricing implication: At $0.50 average per case on Sonnet, a customer running 100 cases/month spends ~$50/month on Anthropic. Your platform license should sit above their API cost — a $200–500/month license fee is defensible if you're delivering 100 cases of quality strategy output. At 1,000 cases/month, their API cost is ~$500; premium licensing at $1,000–2,000/month is reasonable.

Your platform infrastructure costs (USD/month)

These are your costs — what you pay to run the platform regardless of customer usage. Embedding inference runs on your DGX (already owned), which keeps this dramatically lower than a pure-cloud architecture.

ServicePilot
1–5 customers
Growth
10–30 customers
Scale
50+ customers
Notes
AWS ECS (Fargate) — API + workers $25–50 $80–150 $300–600 4 services (API, queue processor, enrollment sync, admin). Pilot: 0.5 vCPU / 1GB each. $0.04048/vCPU-hr, $0.004445/GB-hr on Fargate. Scale adds replicas and memory.
AWS RDS Postgres (db.t3.medium, Multi-AZ) $75 $75–120 $150–300 db.t3.medium Multi-AZ ≈ $75/mo in ap-southeast-2. Growth may need db.m5.large (~$120). Scale: db.m5.xlarge or read replica added.
AWS SQS (job queue) <$1 $1–5 $5–20 First 1M requests/month free. At 1,000 jobs/day that's ~30K/month. Negligible until very high volume.
AWS Secrets Manager $2–5 $5–15 $15–40 $0.40/secret/month + $0.05 per 10K API calls. 5 customers × 1 key each = $2/mo base. Scales linearly.
AWS Application Load Balancer $18 $18–25 $25–60 ~$0.025/hr base + LCU charges. Fixed cost until high request volume.
AWS CloudWatch (logs + metrics) $5–10 $15–30 $40–80 $0.57/GB ingested, $0.03/GB stored. Pilot: minimal logs. Scale: structured logging across all services adds up.
AWS S3 (backups, exports, source packs) <$5 $5–15 $15–40 $0.023/GB/mo in ap-southeast-2. DB backups, case outputs, embedding snapshots. Grows with case volume.
AWS VPC + NAT Gateway $35–40 $35–50 $50–100 NAT Gateway ≈ $0.059/hr (~$43/mo) plus data transfer. Required for private subnets. Often a surprise cost.
AWS Data Transfer (outbound) <$5 $10–25 $30–80 First 100 GB/mo free. Strategy reports can be large (20–100KB each). Scale at 50K cases/mo: ~50GB output, minimal cost.
Cloudflare (Pages + Workers) $0–5 $5–20 $20–50 Pages: free tier covers pilot. Workers: 10M requests/mo free, then $0.30/M. Scale: Workers Paid plan ($5/mo) handles most needs. Enterprise for SLA.
Stripe (billing infrastructure) 1.7% + A$0.30 per txn same negotiate custom No monthly fee for standard. 1.7% + $0.30 per domestic card transaction. On a $500 license: ~$8.80/transaction. Billing module: 0.7% of billing volume on pay-as-you-go.
Tailscale (remote access mesh) $0–18 $18 $18–90 Free up to 3 users. Starter $18/mo for up to 10 nodes. Essential — do not skip this.
DGX running costs (inference layer) Already owned Already owned +colocation if needed Electricity: ~$50–100/mo depending on load. Colocation (if needed at scale): $500–1,500/mo. If collocated, eliminate NAT Gateway and some egress costs.
Total platform cost (excl. DGX power) ~$165–215/mo ~$250–455/mo ~$650–1,370/mo DGX handling inference keeps you well below $2K/mo even at significant scale — this is the core cost advantage of the hybrid architecture.

Unit economics — Pilot

1–5 customers, 50–200 cases/month total

Revenue (example)

$1,000–2,500/mo

3 customers × $500/mo avg license

Platform costs

~$200/mo

AWS infra + Cloudflare + Tailscale

Gross margin

~85–92%

Before your time. Customers pay their own Anthropic costs.

Unit economics — Growth

20 customers, 1,000–3,000 cases/month total

Revenue (example)

$8,000–15,000/mo

20 customers × $400–750/mo avg

Platform costs

~$350/mo

AWS scales modestly; DGX absorbs inference

Gross margin

~96–98%

Platform costs barely move. Nearly all revenue is margin.

Unit economics — Scale

50+ customers, 10,000+ cases/month

Revenue (example)

$25,000–60,000/mo

50 customers × $500–1,200/mo avg

Platform costs

~$1,000–1,500/mo

DGX may need colocation or supplemental cloud GPU

Gross margin

~95–97%

Software margins. The DGX investment pays off enormously here.

NAT Gateway is the hidden AWS cost. At ~$43/month plus $0.059/GB of data processed, it surprises most teams. If all your ECS tasks need internet access (Anthropic API, GDELT, Cloudflare KV), NAT costs can exceed your RDS bill at scale. Mitigation: use VPC endpoints for AWS services (S3, Secrets Manager, SQS) to bypass NAT for internal traffic.
AWS Reserved Instances cut compute 30–40%. Once you have stable baseline usage (typically after 3 months of production data), buy 1-year reserved instances for your ECS/RDS baseline. RDS db.t3.medium reserved: ~$45/mo vs $75 on-demand. Fargate Savings Plans: up to 50% off. Do not buy reserved until you know your steady-state usage.
The DGX is your biggest cost advantage. Running embedding inference (qwen2.5:3b) and option generation (qwen2.5:14b) locally means you pay zero per-token inference costs on the most compute-intensive parts of the pipeline. A cloud-only architecture doing the same work on AWS GPU instances (g4dn.xlarge at ~$0.526/hr) would add $400–2,000/mo depending on throughput. The DGX eliminates this entirely.