Architecture / Deployment

From DGX to Licensed Product

A complete architecture plan for deploying the Decision Intelligence Platform as a production-grade, multi-tenant licensed product — covering platform options, required components, time estimates, and the recommended path.

01 — Current State

What exists today

The platform is a fully operational local-first decision intelligence pipeline running on the DGX. It is not a prototype — it has processed 68 real cases, has 84,112 indexed chunks from 190 trusted books, a live Cloudflare Worker interface, queue-based job processing, and a master strategist evaluation layer.

Compute

NVIDIA DGX

Local inference via Ollama. qwen2.5:3b embeddings (2,048 dims), qwen2.5:14b option generation. 2.5GB embedding index. All private, zero API cost for retrieval.

Orchestration

OpenClaw + Systemd

Queue watcher service, cron jobs, embedding watchdog. Event-driven via Cloudflare KV triggers. Fallback model chain: HTTP gateway → Claude CLI → OpenAI → Anthropic API.

Interface

Cloudflare Pages + Worker

Static frontend at visuals.professionalopinions.com.au. Worker handles challenge submission, KV writes, email notifications, Attio CRM upsert. Browser polls for results.

Synthesis

Anthropic Sonnet

Final synthesis via Claude CLI with OAuth auto-refresh. GDELT live context injection. Source-backed output with attribution tracking and compensation ledger.

02 — Platform Options

Where can it run?

Recommended

Option A

Full Cloud — AWS

The DGX becomes your dev/build environment. All customer-facing workloads move to AWS. ECS or EKS for containers, RDS for the database, SQS for queues, Secrets Manager for API keys. Cloudflare stays as the edge layer and interface.

Strengths

Multi-AZ — no single point of failure
Horizontal scaling per customer load
Managed ops: RDS, SQS, ECS health checks
Clean licensing model — DGX not in the critical path
Enterprise-ready from day one

Tradeoffs

Higher ongoing cost vs DGX (already paid for)
GPU inference on AWS expensive ($2–6/hr per instance)
More infra to manage initially

Recommendation: Run customer-facing API, auth, billing, and dashboards on AWS. Keep embedding inference on DGX or use AWS GPU instances only when DGX is unavailable.

Viable with caveats

Option B

Hybrid — DGX + Cloud wrapper

DGX handles heavy inference. AWS handles customer-facing layer: auth, billing, dashboards, API routing. Connected via Tailscale or VPN.

Strengths

Lowest inference cost — DGX already paid for
Maximum GPU performance for local models
Fast to stand up

Tradeoffs

DGX is a single point of failure
Box goes down = all customers affected
Not viable for mission-critical SLAs without a standby
Harder to scale beyond one concurrent heavy job

Operational complexity

Option C

Co-located DGX

Move the DGX into a data centre with proper uptime SLAs, redundant power, and high bandwidth. Wrap with cloud services for customer layer.

Strengths

Keeps GPU performance, eliminates home-lab risk
Colocation cost ~$500–1,500/month vs $3,000+/month cloud GPU
Dedicated hardware — no noisy neighbours

Tradeoffs

Physical logistics — shipping, racking, power
Harder to scale horizontally
Remote access dependent on data centre network

Emerging

Option D

NVIDIA DGX Cloud

NVIDIA's managed DGX-as-a-service. Pay-per-hour GPU access. Keeps the workload native to DGX architecture without owning hardware.

Strengths

No hardware ownership risk
Native DGX performance
Scales elastically

Tradeoffs

Early stage — limited regions
Expensive at scale
Vendor lock-in to NVIDIA platform

03 — Target Architecture

Production architecture (recommended path)

CUSTOMER BROWSER │ ▼ CLOUDFLARE EDGE (Pages + Workers) │ Static frontend · Challenge submission · Result polling │ Per-customer auth token validation │ ▼ AWS API GATEWAY ← JWT auth ← Customer identity │ ├──▶ ECS / EKS CLUSTER │ │ │ ├── API service (FastAPI) ← handles job intake, result delivery │ ├── Queue processor workers ← reads SQS, runs pipeline │ ├── Enrollment sync service ← KV → users.jsonl │ └── Admin service (internal only) ← manage customers, toggle features │ ├──▶ AWS SQS ← durable job queue (survives worker restarts) │ ├──▶ AWS RDS (Postgres) ← customer accounts, billing, usage, attribution │ ├──▶ AWS SECRETS MANAGER ← per-customer Anthropic API keys (encrypted at rest) │ └──▶ INFERENCE LAYER │ ├── DGX (local / colo) │ Ollama · qwen2.5:3b embeddings · qwen2.5:14b options │ LanceDB / JSONL index · 84,112 chunks │ └── AWS GPU fallback (g4dn.xlarge / p3) Triggered only if DGX unavailable GDELT ─────────────────────────────────▶ live context injection (per-case) ANTHROPIC API ────────────────────────▶ Sonnet synthesis (customer's own key) STRIPE ──────────────────────────────▶ subscription billing + usage metering TAILSCALE ────────────────────────────▶ remote access to DGX + all nodes

04 — Required Components

Multi-tenancy

Customer isolation

Each customer gets their own account, isolated data namespace, and encrypted Anthropic API key stored in Secrets Manager. Per-customer rate limiting and usage tracking from day one.

Interface

Unified command centre

Single web app surfacing all platform capabilities. Feature flags control which modules each licensed customer can access. Admin panel for you to manage customers, toggle features, inspect usage.

Token visibility

Cost dashboard

Real-time view: tokens consumed by model, by feature, by day. Pulls from Anthropic usage API plus local instrumentation. Budget alerts and cost projections per customer.

Stability

Self-healing layer

Circuit breakers on all external APIs — if Anthropic, GDELT, or any dependency is down, the pipeline degrades gracefully instead of crashing. Queue-based async so nothing dies on worker restart.

Reliability

Multi-AZ deployment

Two availability zones minimum. Health checks and auto-restart on ECS. Automated database backups with tested restore. PagerDuty or equivalent for on-call alerting.

Remote access

Operator tooling

Tailscale mesh for SSH from anywhere. AWS SSM Session Manager for browser-based shell — no open ports needed. Admin API for triggering restarts, draining queues, revoking customer keys.

Billing

Stripe integration

Subscription billing with optional usage-based components. Each customer brings their own Anthropic key — they own the model cost directly. Your license fee is separate from their API spend.

Recovery

Runbook + DR

Documented recovery procedures for every failure mode. Automated restore tests monthly. Configuration backups versioned in S3. No undocumented single points of failure.

Security

Customer data isolation

Row-level security in Postgres. API keys never logged. Network isolation per tenant where possible. Audit trail for all admin actions. TLS everywhere, no plaintext secrets in environment variables.

05 — Build Costs & Timeline

Option A — Solo build (you)

labour cost

5–6 months

Your time is the cost. AWS setup and tooling during build: ~$50–100/mo (dev environment, test deployments). No contractor spend. Slowest path but highest control.

Option B — You + 1 engineer (contractor)

$40–80K

total build cost (AUD)

3–4 months

One strong DevOps/fullstack engineer at $150–250/hr (contract rate, AU market). 300–400 hours over 3–4 months. AWS dev environment: ~$200–400 total during build. Fastest credible path to launch.

Option C — MVP only (fastest to revenue)

$15–30K

total build cost (AUD)

6–8 weeks

Containerise + auth + one paying customer + manual billing. Skip dashboard, skip self-healing. ~100–150 contractor hours. Enough to validate willingness to pay before full investment.

Phase-by-phase breakdown

Contractor cost estimates assume AU market rate of $150–250/hr for a senior DevOps/fullstack engineer. Solo columns assume your time only (no cash outlay beyond AWS).

Phase	Work	Time	Solo cost	Contractor cost
1 · Containerisation	Dockerise all pipeline services. Docker Compose for local dev. CI/CD pipeline (GitHub Actions → ECR → ECS). Smoke tests pass in containers before anything else.	2–3 wks	$0	$5–10K
2 · Multi-tenancy	Auth layer (JWT + API keys). Per-customer Anthropic key vault in Secrets Manager. Data isolation in Postgres. Per-customer rate limiting and usage tracking.	3–4 wks	$0	$8–14K
3 · Unified interface	Web app surfacing all capabilities behind auth. Feature flags per customer tier. Admin panel: manage customers, toggle features, inspect usage and attribution.	4–6 wks	$0	$10–20K
4 · Cost dashboard	Anthropic usage API + local instrumentation. Per-customer token and cost visibility. Budget alerts. Stripe subscription billing with usage-based components.	2–3 wks	$0	$5–10K
5 · AWS deployment	ECS on AWS, multi-AZ. Health checks, auto-restart. RDS Postgres. SQS job queue. Secrets Manager. Cloudflare stays as edge + interface layer.	3–4 wks	$200–400	$8–14K
6 · Self-healing	Circuit breakers on Anthropic, GDELT, and all external APIs. Queue-based async throughout. PagerDuty alerting. Automated DB backup and tested restore procedure.	2–3 wks	$0	$5–10K
7 · Remote access	Tailscale mesh across all nodes. AWS SSM Session Manager. Admin API (restart, drain, revoke). Runbook documentation for every failure mode.	1 week	$0	$2–4K
8 · Hardening	Load testing (10× peak). Security audit or external penetration test. Customer onboarding flow. Pilot with 2–3 known customers before public launch.	2–3 wks	$500–2K	$5–10K
Total		5–6 months	~$700–2,400 AUD	~$48–92K AUD

AWS infrastructure during build is modest: a dev ECS cluster, RDS dev instance, and S3 bucket cost ~$50–100/mo. Run it for 3–6 months during development = $150–600 total before launch. Not a material build cost.

06 — Pushback & Recommendations

Start with ECS, not Kubernetes. Kubernetes adds significant operational burden — cluster management, node pools, YAML sprawl. ECS (AWS Elastic Container Service) is production-grade, simpler, and handles self-healing containers, auto-scaling, and rolling deploys without the overhead. Migrate to EKS only if you genuinely hit problems ECS cannot solve.

Phase 1 priority: containerise first, cloud second. Get everything running in Docker locally before touching AWS. If it doesn't run cleanly in a container, it won't run cleanly in the cloud either. This also lets you test the full pipeline in isolation — embedding, retrieval, synthesis, queue — before the network and auth complexity arrive.

The DGX is an asset, not a liability — use it right. The embedding index (2.5GB, 84K chunks) is the core IP. Keep local inference on the DGX or co-located hardware. Cloud GPU instances are expensive and unnecessary for inference your hardware already handles. Use AWS for stateless API workers, auth, and databases only.

Customer brings their own Anthropic key — design around this early. This is the right model: they own the model cost, you own the platform. But it means your auth and secrets layer must handle key storage, rotation, and revocation cleanly from day one. Do not retrofit this — it touches every part of the pipeline.

Define "mission critical" before building for it. True mission-critical (99.9%+ uptime SLA) requires redundant DGX or a hot standby, multi-AZ everything, and tested failover. That is materially more expensive and complex than "highly available." For a v1 licensing product, "highly available with documented recovery" (99.5%) is likely sufficient and far cheaper to build.

Build observability before features. Before you add licensed modules or customer-facing features, instrument the pipeline: request latency, model call success/failure rates, queue depth, embedding hit rates, cost per case. You cannot debug or sell a system you cannot measure.

07 — Recommended Next Steps

Where to start

Containerise the pipeline Week 1–3

Write Dockerfiles for each service: API, queue processor, enrollment sync, embedding watchdog
Docker Compose file that runs the full pipeline locally end-to-end
Verify: submit a challenge, watch it process, receive a result — all inside containers
CI pipeline: push to GitHub → build images → push to ECR

Add auth and multi-tenancy skeleton Week 3–6

Postgres schema: customers, api_keys, usage_events, jobs
JWT auth middleware in the API service
Secrets Manager integration — store and retrieve per-customer Anthropic keys
First real customer account created and a job processed under their key

Deploy to AWS Week 6–9

ECS cluster with the containerised services
RDS Postgres (multi-AZ), SQS queue, Secrets Manager
Cloudflare Worker updated to point at AWS API Gateway instead of DGX directly
Tailscale mesh connected — SSH access to all nodes from anywhere

Pilot with first customer Week 9–12

Onboard one known customer — they supply their Anthropic key, you supply access
Manual billing acceptable at this stage
Instrument everything: latency, cost per case, errors
Fix the top 5 issues before opening to more customers

Cost dashboard + Stripe Week 10–14

Token visibility per customer: by model, by feature, by day
Stripe subscriptions with usage-based metering if needed
Budget alerts so customers know before they overspend

Self-healing and hardening Week 12–18

Circuit breakers on all external API calls
Automated DB backup with monthly tested restore
Load testing at 10× expected peak
Runbook for every failure mode documented and tested

08 — Running Costs

What it costs to run

Three scenarios: Pilot (1–5 customers, low volume), Growth (10–30 customers), Scale (50+ customers). All figures in USD/month unless noted. Anthropic API costs are not included — customers bring their own keys and pay Anthropic directly. Prices current as of April 2026.

Anthropic API — customer's own key

Each customer pays Anthropic directly. These are the costs your customers will incur per use of the platform. Understanding this is critical for positioning your license fee correctly.

Model	Input	Output	Typical strategy case cost	Notes
Sonnet 4.6 (current)	$3 / MTok	$15 / MTok	~$0.30–0.80	Primary synthesis model. A typical case: ~30K input tokens (corpus + source pack + prompt), ~8K output. Cost ≈ $0.09 input + $0.12–0.40 output.
Opus 4.6 (premium)	$5 / MTok	$25 / MTok	~$0.75–2.00	Reserve for premium tier / complex cases. Same token volumes cost ~3× more than Sonnet.
Haiku 4.5 (routing/triage)	$1 / MTok	$5 / MTok	~$0.03–0.10	Use for lightweight tasks: intent classification, queue routing, quick checks. Not for synthesis.
Batch processing discount	50% off all models for async workloads. If customers can tolerate 30–60 min latency, batch mode halves their Anthropic bill. Significant for high-volume users.

Pricing implication: At $0.50 average per case on Sonnet, a customer running 100 cases/month spends ~$50/month on Anthropic. Your platform license should sit above their API cost — a $200–500/month license fee is defensible if you're delivering 100 cases of quality strategy output. At 1,000 cases/month, their API cost is ~$500; premium licensing at $1,000–2,000/month is reasonable.

Your platform infrastructure costs (USD/month)

These are your costs — what you pay to run the platform regardless of customer usage. Embedding inference runs on your DGX (already owned), which keeps this dramatically lower than a pure-cloud architecture.

Service	Pilot 1–5 customers	Growth 10–30 customers	Scale 50+ customers	Notes
AWS ECS (Fargate) — API + workers	$25–50	$80–150	$300–600	4 services (API, queue processor, enrollment sync, admin). Pilot: 0.5 vCPU / 1GB each. $0.04048/vCPU-hr, $0.004445/GB-hr on Fargate. Scale adds replicas and memory.
AWS RDS Postgres (db.t3.medium, Multi-AZ)	$75	$75–120	$150–300	db.t3.medium Multi-AZ ≈ $75/mo in ap-southeast-2. Growth may need db.m5.large (~$120). Scale: db.m5.xlarge or read replica added.
AWS SQS (job queue)	<$1	$1–5	$5–20	First 1M requests/month free. At 1,000 jobs/day that's ~30K/month. Negligible until very high volume.
AWS Secrets Manager	$2–5	$5–15	$15–40	$0.40/secret/month + $0.05 per 10K API calls. 5 customers × 1 key each = $2/mo base. Scales linearly.
AWS Application Load Balancer	$18	$18–25	$25–60	~$0.025/hr base + LCU charges. Fixed cost until high request volume.
AWS CloudWatch (logs + metrics)	$5–10	$15–30	$40–80	$0.57/GB ingested, $0.03/GB stored. Pilot: minimal logs. Scale: structured logging across all services adds up.
AWS S3 (backups, exports, source packs)	<$5	$5–15	$15–40	$0.023/GB/mo in ap-southeast-2. DB backups, case outputs, embedding snapshots. Grows with case volume.
AWS VPC + NAT Gateway	$35–40	$35–50	$50–100	NAT Gateway ≈ $0.059/hr (~$43/mo) plus data transfer. Required for private subnets. Often a surprise cost.
AWS Data Transfer (outbound)	<$5	$10–25	$30–80	First 100 GB/mo free. Strategy reports can be large (20–100KB each). Scale at 50K cases/mo: ~50GB output, minimal cost.
Cloudflare (Pages + Workers)	$0–5	$5–20	$20–50	Pages: free tier covers pilot. Workers: 10M requests/mo free, then $0.30/M. Scale: Workers Paid plan ($5/mo) handles most needs. Enterprise for SLA.
Stripe (billing infrastructure)	1.7% + A$0.30 per txn	same	negotiate custom	No monthly fee for standard. 1.7% + $0.30 per domestic card transaction. On a $500 license: ~$8.80/transaction. Billing module: 0.7% of billing volume on pay-as-you-go.
Tailscale (remote access mesh)	$0–18	$18	$18–90	Free up to 3 users. Starter $18/mo for up to 10 nodes. Essential — do not skip this.
DGX running costs (inference layer)	Already owned	Already owned	+colocation if needed	Electricity: ~$50–100/mo depending on load. Colocation (if needed at scale): $500–1,500/mo. If collocated, eliminate NAT Gateway and some egress costs.
Total platform cost (excl. DGX power)	~$165–215/mo	~$250–455/mo	~$650–1,370/mo	DGX handling inference keeps you well below $2K/mo even at significant scale — this is the core cost advantage of the hybrid architecture.

Unit economics — Pilot

1–5 customers, 50–200 cases/month total

Revenue (example)

$1,000–2,500/mo

3 customers × $500/mo avg license

Platform costs

~$200/mo

AWS infra + Cloudflare + Tailscale

Gross margin

~85–92%

Before your time. Customers pay their own Anthropic costs.

Unit economics — Growth

20 customers, 1,000–3,000 cases/month total

Revenue (example)

$8,000–15,000/mo

20 customers × $400–750/mo avg

Platform costs

~$350/mo

AWS scales modestly; DGX absorbs inference

Gross margin

~96–98%

Platform costs barely move. Nearly all revenue is margin.

Unit economics — Scale

50+ customers, 10,000+ cases/month

Revenue (example)

$25,000–60,000/mo

50 customers × $500–1,200/mo avg

Platform costs

~$1,000–1,500/mo

DGX may need colocation or supplemental cloud GPU

Gross margin

~95–97%

Software margins. The DGX investment pays off enormously here.

NAT Gateway is the hidden AWS cost. At ~$43/month plus $0.059/GB of data processed, it surprises most teams. If all your ECS tasks need internet access (Anthropic API, GDELT, Cloudflare KV), NAT costs can exceed your RDS bill at scale. Mitigation: use VPC endpoints for AWS services (S3, Secrets Manager, SQS) to bypass NAT for internal traffic.

AWS Reserved Instances cut compute 30–40%. Once you have stable baseline usage (typically after 3 months of production data), buy 1-year reserved instances for your ECS/RDS baseline. RDS db.t3.medium reserved: ~$45/mo vs $75 on-demand. Fargate Savings Plans: up to 50% off. Do not buy reserved until you know your steady-state usage.

The DGX is your biggest cost advantage. Running embedding inference (qwen2.5:3b) and option generation (qwen2.5:14b) locally means you pay zero per-token inference costs on the most compute-intensive parts of the pipeline. A cloud-only architecture doing the same work on AWS GPU instances (g4dn.xlarge at ~$0.526/hr) would add $400–2,000/mo depending on throughput. The DGX eliminates this entirely.