From DGX to Licensed Product
A complete architecture plan for deploying the Decision Intelligence Platform as a production-grade, multi-tenant licensed product — covering platform options, required components, time estimates, and the recommended path.
What exists today
The platform is a fully operational local-first decision intelligence pipeline running on the DGX. It is not a prototype — it has processed 68 real cases, has 84,112 indexed chunks from 190 trusted books, a live Cloudflare Worker interface, queue-based job processing, and a master strategist evaluation layer.
NVIDIA DGX
Local inference via Ollama. qwen2.5:3b embeddings (2,048 dims), qwen2.5:14b option generation. 2.5GB embedding index. All private, zero API cost for retrieval.
OpenClaw + Systemd
Queue watcher service, cron jobs, embedding watchdog. Event-driven via Cloudflare KV triggers. Fallback model chain: HTTP gateway → Claude CLI → OpenAI → Anthropic API.
Cloudflare Pages + Worker
Static frontend at visuals.professionalopinions.com.au. Worker handles challenge submission, KV writes, email notifications, Attio CRM upsert. Browser polls for results.
Anthropic Sonnet
Final synthesis via Claude CLI with OAuth auto-refresh. GDELT live context injection. Source-backed output with attribution tracking and compensation ledger.
Where can it run?
Full Cloud — AWS
The DGX becomes your dev/build environment. All customer-facing workloads move to AWS. ECS or EKS for containers, RDS for the database, SQS for queues, Secrets Manager for API keys. Cloudflare stays as the edge layer and interface.
- Multi-AZ — no single point of failure
- Horizontal scaling per customer load
- Managed ops: RDS, SQS, ECS health checks
- Clean licensing model — DGX not in the critical path
- Enterprise-ready from day one
- Higher ongoing cost vs DGX (already paid for)
- GPU inference on AWS expensive ($2–6/hr per instance)
- More infra to manage initially
Hybrid — DGX + Cloud wrapper
DGX handles heavy inference. AWS handles customer-facing layer: auth, billing, dashboards, API routing. Connected via Tailscale or VPN.
- Lowest inference cost — DGX already paid for
- Maximum GPU performance for local models
- Fast to stand up
- DGX is a single point of failure
- Box goes down = all customers affected
- Not viable for mission-critical SLAs without a standby
- Harder to scale beyond one concurrent heavy job
Co-located DGX
Move the DGX into a data centre with proper uptime SLAs, redundant power, and high bandwidth. Wrap with cloud services for customer layer.
- Keeps GPU performance, eliminates home-lab risk
- Colocation cost ~$500–1,500/month vs $3,000+/month cloud GPU
- Dedicated hardware — no noisy neighbours
- Physical logistics — shipping, racking, power
- Harder to scale horizontally
- Remote access dependent on data centre network
NVIDIA DGX Cloud
NVIDIA's managed DGX-as-a-service. Pay-per-hour GPU access. Keeps the workload native to DGX architecture without owning hardware.
- No hardware ownership risk
- Native DGX performance
- Scales elastically
- Early stage — limited regions
- Expensive at scale
- Vendor lock-in to NVIDIA platform
Production architecture (recommended path)
Customer isolation
Each customer gets their own account, isolated data namespace, and encrypted Anthropic API key stored in Secrets Manager. Per-customer rate limiting and usage tracking from day one.
Unified command centre
Single web app surfacing all platform capabilities. Feature flags control which modules each licensed customer can access. Admin panel for you to manage customers, toggle features, inspect usage.
Cost dashboard
Real-time view: tokens consumed by model, by feature, by day. Pulls from Anthropic usage API plus local instrumentation. Budget alerts and cost projections per customer.
Self-healing layer
Circuit breakers on all external APIs — if Anthropic, GDELT, or any dependency is down, the pipeline degrades gracefully instead of crashing. Queue-based async so nothing dies on worker restart.
Multi-AZ deployment
Two availability zones minimum. Health checks and auto-restart on ECS. Automated database backups with tested restore. PagerDuty or equivalent for on-call alerting.
Operator tooling
Tailscale mesh for SSH from anywhere. AWS SSM Session Manager for browser-based shell — no open ports needed. Admin API for triggering restarts, draining queues, revoking customer keys.
Stripe integration
Subscription billing with optional usage-based components. Each customer brings their own Anthropic key — they own the model cost directly. Your license fee is separate from their API spend.
Runbook + DR
Documented recovery procedures for every failure mode. Automated restore tests monthly. Configuration backups versioned in S3. No undocumented single points of failure.
Customer data isolation
Row-level security in Postgres. API keys never logged. Network isolation per tenant where possible. Audit trail for all admin actions. TLS everywhere, no plaintext secrets in environment variables.
Your time is the cost. AWS setup and tooling during build: ~$50–100/mo (dev environment, test deployments). No contractor spend. Slowest path but highest control.
One strong DevOps/fullstack engineer at $150–250/hr (contract rate, AU market). 300–400 hours over 3–4 months. AWS dev environment: ~$200–400 total during build. Fastest credible path to launch.
Containerise + auth + one paying customer + manual billing. Skip dashboard, skip self-healing. ~100–150 contractor hours. Enough to validate willingness to pay before full investment.
Phase-by-phase breakdown
Contractor cost estimates assume AU market rate of $150–250/hr for a senior DevOps/fullstack engineer. Solo columns assume your time only (no cash outlay beyond AWS).
| Phase | Work | Time | Solo cost | Contractor cost |
|---|---|---|---|---|
| 1 · Containerisation | Dockerise all pipeline services. Docker Compose for local dev. CI/CD pipeline (GitHub Actions → ECR → ECS). Smoke tests pass in containers before anything else. | 2–3 wks | $0 | $5–10K |
| 2 · Multi-tenancy | Auth layer (JWT + API keys). Per-customer Anthropic key vault in Secrets Manager. Data isolation in Postgres. Per-customer rate limiting and usage tracking. | 3–4 wks | $0 | $8–14K |
| 3 · Unified interface | Web app surfacing all capabilities behind auth. Feature flags per customer tier. Admin panel: manage customers, toggle features, inspect usage and attribution. | 4–6 wks | $0 | $10–20K |
| 4 · Cost dashboard | Anthropic usage API + local instrumentation. Per-customer token and cost visibility. Budget alerts. Stripe subscription billing with usage-based components. | 2–3 wks | $0 | $5–10K |
| 5 · AWS deployment | ECS on AWS, multi-AZ. Health checks, auto-restart. RDS Postgres. SQS job queue. Secrets Manager. Cloudflare stays as edge + interface layer. | 3–4 wks | $200–400 | $8–14K |
| 6 · Self-healing | Circuit breakers on Anthropic, GDELT, and all external APIs. Queue-based async throughout. PagerDuty alerting. Automated DB backup and tested restore procedure. | 2–3 wks | $0 | $5–10K |
| 7 · Remote access | Tailscale mesh across all nodes. AWS SSM Session Manager. Admin API (restart, drain, revoke). Runbook documentation for every failure mode. | 1 week | $0 | $2–4K |
| 8 · Hardening | Load testing (10× peak). Security audit or external penetration test. Customer onboarding flow. Pilot with 2–3 known customers before public launch. | 2–3 wks | $500–2K | $5–10K |
| Total | 5–6 months | ~$700–2,400 AUD | ~$48–92K AUD |
Where to start
- Write Dockerfiles for each service: API, queue processor, enrollment sync, embedding watchdog
- Docker Compose file that runs the full pipeline locally end-to-end
- Verify: submit a challenge, watch it process, receive a result — all inside containers
- CI pipeline: push to GitHub → build images → push to ECR
- Postgres schema: customers, api_keys, usage_events, jobs
- JWT auth middleware in the API service
- Secrets Manager integration — store and retrieve per-customer Anthropic keys
- First real customer account created and a job processed under their key
- ECS cluster with the containerised services
- RDS Postgres (multi-AZ), SQS queue, Secrets Manager
- Cloudflare Worker updated to point at AWS API Gateway instead of DGX directly
- Tailscale mesh connected — SSH access to all nodes from anywhere
- Onboard one known customer — they supply their Anthropic key, you supply access
- Manual billing acceptable at this stage
- Instrument everything: latency, cost per case, errors
- Fix the top 5 issues before opening to more customers
- Token visibility per customer: by model, by feature, by day
- Stripe subscriptions with usage-based metering if needed
- Budget alerts so customers know before they overspend
- Circuit breakers on all external API calls
- Automated DB backup with monthly tested restore
- Load testing at 10× expected peak
- Runbook for every failure mode documented and tested
What it costs to run
Three scenarios: Pilot (1–5 customers, low volume), Growth (10–30 customers), Scale (50+ customers). All figures in USD/month unless noted. Anthropic API costs are not included — customers bring their own keys and pay Anthropic directly. Prices current as of April 2026.
Anthropic API — customer's own key
Each customer pays Anthropic directly. These are the costs your customers will incur per use of the platform. Understanding this is critical for positioning your license fee correctly.
| Model | Input | Output | Typical strategy case cost | Notes |
|---|---|---|---|---|
| Sonnet 4.6 (current) | $3 / MTok | $15 / MTok | ~$0.30–0.80 | Primary synthesis model. A typical case: ~30K input tokens (corpus + source pack + prompt), ~8K output. Cost ≈ $0.09 input + $0.12–0.40 output. |
| Opus 4.6 (premium) | $5 / MTok | $25 / MTok | ~$0.75–2.00 | Reserve for premium tier / complex cases. Same token volumes cost ~3× more than Sonnet. |
| Haiku 4.5 (routing/triage) | $1 / MTok | $5 / MTok | ~$0.03–0.10 | Use for lightweight tasks: intent classification, queue routing, quick checks. Not for synthesis. |
| Batch processing discount | 50% off all models for async workloads. If customers can tolerate 30–60 min latency, batch mode halves their Anthropic bill. Significant for high-volume users. | |||
Your platform infrastructure costs (USD/month)
These are your costs — what you pay to run the platform regardless of customer usage. Embedding inference runs on your DGX (already owned), which keeps this dramatically lower than a pure-cloud architecture.
| Service | Pilot 1–5 customers | Growth 10–30 customers | Scale 50+ customers | Notes |
|---|---|---|---|---|
| AWS ECS (Fargate) — API + workers | $25–50 | $80–150 | $300–600 | 4 services (API, queue processor, enrollment sync, admin). Pilot: 0.5 vCPU / 1GB each. $0.04048/vCPU-hr, $0.004445/GB-hr on Fargate. Scale adds replicas and memory. |
| AWS RDS Postgres (db.t3.medium, Multi-AZ) | $75 | $75–120 | $150–300 | db.t3.medium Multi-AZ ≈ $75/mo in ap-southeast-2. Growth may need db.m5.large (~$120). Scale: db.m5.xlarge or read replica added. |
| AWS SQS (job queue) | <$1 | $1–5 | $5–20 | First 1M requests/month free. At 1,000 jobs/day that's ~30K/month. Negligible until very high volume. |
| AWS Secrets Manager | $2–5 | $5–15 | $15–40 | $0.40/secret/month + $0.05 per 10K API calls. 5 customers × 1 key each = $2/mo base. Scales linearly. |
| AWS Application Load Balancer | $18 | $18–25 | $25–60 | ~$0.025/hr base + LCU charges. Fixed cost until high request volume. |
| AWS CloudWatch (logs + metrics) | $5–10 | $15–30 | $40–80 | $0.57/GB ingested, $0.03/GB stored. Pilot: minimal logs. Scale: structured logging across all services adds up. |
| AWS S3 (backups, exports, source packs) | <$5 | $5–15 | $15–40 | $0.023/GB/mo in ap-southeast-2. DB backups, case outputs, embedding snapshots. Grows with case volume. |
| AWS VPC + NAT Gateway | $35–40 | $35–50 | $50–100 | NAT Gateway ≈ $0.059/hr (~$43/mo) plus data transfer. Required for private subnets. Often a surprise cost. |
| AWS Data Transfer (outbound) | <$5 | $10–25 | $30–80 | First 100 GB/mo free. Strategy reports can be large (20–100KB each). Scale at 50K cases/mo: ~50GB output, minimal cost. |
| Cloudflare (Pages + Workers) | $0–5 | $5–20 | $20–50 | Pages: free tier covers pilot. Workers: 10M requests/mo free, then $0.30/M. Scale: Workers Paid plan ($5/mo) handles most needs. Enterprise for SLA. |
| Stripe (billing infrastructure) | 1.7% + A$0.30 per txn | same | negotiate custom | No monthly fee for standard. 1.7% + $0.30 per domestic card transaction. On a $500 license: ~$8.80/transaction. Billing module: 0.7% of billing volume on pay-as-you-go. |
| Tailscale (remote access mesh) | $0–18 | $18 | $18–90 | Free up to 3 users. Starter $18/mo for up to 10 nodes. Essential — do not skip this. |
| DGX running costs (inference layer) | Already owned | Already owned | +colocation if needed | Electricity: ~$50–100/mo depending on load. Colocation (if needed at scale): $500–1,500/mo. If collocated, eliminate NAT Gateway and some egress costs. |
| Total platform cost (excl. DGX power) | ~$165–215/mo | ~$250–455/mo | ~$650–1,370/mo | DGX handling inference keeps you well below $2K/mo even at significant scale — this is the core cost advantage of the hybrid architecture. |
Unit economics — Pilot
1–5 customers, 50–200 cases/month total
$1,000–2,500/mo
3 customers × $500/mo avg license
~$200/mo
AWS infra + Cloudflare + Tailscale
~85–92%
Before your time. Customers pay their own Anthropic costs.
Unit economics — Growth
20 customers, 1,000–3,000 cases/month total
$8,000–15,000/mo
20 customers × $400–750/mo avg
~$350/mo
AWS scales modestly; DGX absorbs inference
~96–98%
Platform costs barely move. Nearly all revenue is margin.
Unit economics — Scale
50+ customers, 10,000+ cases/month
$25,000–60,000/mo
50 customers × $500–1,200/mo avg
~$1,000–1,500/mo
DGX may need colocation or supplemental cloud GPU
~95–97%
Software margins. The DGX investment pays off enormously here.