Skip to main content

Architecture

Version: 0.1.0

DecisionBox has three services, one database, and a plugin system for extensibility. There are no message queues, caches, or event streams — just MongoDB.

System Overview

┌─────────────────────────────────────────────────────────┐
│ Dashboard (Next.js 16) │
│ http://localhost:3000 │
│ │
│ - Project management (create, edit, delete) │
│ - Discovery results (insights table, recommendations) │
│ - Live progress (real-time step feed) │
│ - Prompt editor (markdown, per-project) │
│ - Settings (warehouse, LLM, secrets, schedule) │
│ - Feedback (like/dislike insights + recommendations) │
│ │
│ All /api/* requests proxied to API via Next.js │
│ middleware (server-side, API never exposed publicly) │
└──────────────────────────┬──────────────────────────────┘
│ HTTP proxy (runtime, not build-time)

┌─────────────────────────────────────────────────────────┐
│ API (Go, net/http) │
│ http://localhost:8080 │
│ │
│ - REST endpoints (projects, discoveries, prompts, │
│ feedback, pricing, secrets, health) │
│ - Spawns agent as subprocess (local) or K8s Job (prod) │
│ - Reads provider metadata for dynamic UI forms │
│ - Seeds pricing from registered providers │
│ - No authentication (open-source, internal use) │
└──────┬──────────────────────────────────────┬───────────┘
│ exec.Command / K8s Job │ MongoDB driver
▼ ▼
┌──────────────────────┐ ┌──────────────────┐
│ Agent (Go binary) │ │ MongoDB 7+ │
│ │──────write──▶│ │
│ Autonomous AI │ │ Collections: │
│ data explorer │ │ - projects │
│ │ │ - discoveries │
│ Components: │ │ - discovery_runs│
│ - LLM provider │ │ - feedback │
│ - Warehouse prov. │ │ - secrets │
│ - Domain pack │ │ - pricing │
│ - Secret provider │ │ - project_ctx │
│ - Prompts │ │ - debug_logs │
└──────────┬───────────┘ └──────────────────┘
│ SQL queries

┌──────────────────────┐
│ Data Warehouse │
│ │
│ BigQuery │
│ Amazon Redshift │
│ (read-only access) │
└──────────────────────┘

Components

Dashboard

The web UI. Built with Next.js 16, React 19, TypeScript, and Mantine 8.

Key design decision: The dashboard proxies all /api/* requests to the API via Next.js middleware. The API is never exposed publicly. This means:

  • No CORS issues
  • Single ingress point (only the dashboard needs a public URL)
  • API URL is a runtime environment variable (API_URL), not baked at build time
  • One Docker image works across all environments

API

The REST API. Built with Go's standard net/http package (no frameworks). Handles:

  • Project CRUD — Create, read, update, delete projects
  • Discovery management — Trigger runs, list results, get status
  • Agent spawning — Starts the agent as a subprocess or Kubernetes Job
  • Provider metadata — Returns available LLM/warehouse providers with config field definitions for dynamic UI forms
  • Prompts — Read/write per-project prompt overrides
  • Secrets — Per-project encrypted key storage
  • Feedback — Like/dislike on insights and recommendations
  • Health — Liveness and readiness probes

The API has no authentication in v0.1.0. It's designed for internal use — the dashboard sits in front of it.

Agent

The autonomous AI data explorer. A standalone Go binary that:

  1. Loads project configuration from MongoDB
  2. Initializes providers (LLM, warehouse, secrets, domain pack)
  3. Discovers warehouse table schemas
  4. Runs autonomous exploration (AI writes SQL, executes, iterates)
  5. Analyzes results per analysis area
  6. Validates insights against warehouse data
  7. Generates recommendations
  8. Saves results to MongoDB
  9. Updates run status throughout

The agent is stateless — it reads everything from MongoDB and the domain pack files. It can run as:

  • A subprocess spawned by the API (local development, RUNNER_MODE=subprocess)
  • A Kubernetes Job created by the API (production, RUNNER_MODE=kubernetes)

MongoDB

The only infrastructure dependency. Stores:

CollectionPurpose
projectsProject configuration (name, warehouse, LLM, schedule, profile, prompts)
discoveriesDiscovery results (insights, recommendations, logs, validation)
discovery_runsLive run status (phase, progress, steps, errors)
feedbackUser feedback on insights and recommendations
secretsEncrypted per-project secrets (API keys, credentials)
pricingLLM and warehouse pricing configuration
project_contextRolling context (previous insights, patterns)
discovery_debug_logsDetailed debug logs (TTL: 30 days)

All collections and indexes are created automatically on API startup (idempotent).

Plugin Architecture

DecisionBox is built on four plugin systems. Each uses the same pattern: providers register themselves via init() functions, and services select them by name at runtime.

How Registration Works

// In a provider package (e.g., providers/llm/claude/provider.go)
func init() {
llm.Register("claude", func(cfg llm.ProviderConfig) (llm.Provider, error) {
return NewClaudeProvider(cfg["api_key"], cfg["model"])
})
}

// In a service (e.g., services/agent/main.go)
import _ "github.com/decisionbox-io/decisionbox/providers/llm/claude" // triggers init()

provider, err := llm.NewProvider("claude", llm.ProviderConfig{
"api_key": "sk-ant-...",
"model": "claude-sonnet-4-20250514",
})

Services import provider packages with blank imports (_). The init() function runs at startup and registers the provider factory. The service then creates providers by name.

Four Plugin Types

PluginInterfacePurposeShipped Implementations
LLMllm.ProviderAI model accessclaude, openai, ollama, vertex-ai, bedrock
Warehousewarehouse.ProviderData warehouse accessbigquery, redshift
Secretssecrets.ProviderEncrypted key storagemongodb, gcp, aws
Domain Packdomainpack.DiscoveryPackDomain-specific analysisgaming (match-3)

For details on implementing each, see:

Data Flow

Discovery Run

1. User clicks "Run discovery" in Dashboard

2. Dashboard sends POST /api/v1/projects/{id}/discover

3. API creates a run record in MongoDB (status: pending)

4. API spawns agent (subprocess or K8s Job)

5. Agent loads project config, secrets, prompts from MongoDB

6. Agent initializes LLM provider, warehouse provider, domain pack

7. Agent discovers warehouse schemas (LIST TABLES, GET SCHEMA)

8. Agent runs exploration:
a. Sends schema + prompt to LLM
b. LLM generates SQL query
c. Agent executes query against warehouse
d. Agent sends results back to LLM
e. LLM generates next query based on results
f. Repeat for N steps (default: 100)
g. Each step written to run record in MongoDB (live progress)

9. Agent runs analysis per area:
a. Loads area-specific prompt (e.g., analysis_churn.md)
b. Feeds relevant exploration results to LLM
c. LLM generates insights (JSON)
d. Agent parses and assigns IDs

10. Agent validates insights:
a. For each insight with affected_count
b. Generates verification SQL
c. Executes against warehouse
d. Compares claimed vs verified count

11. Agent generates recommendations:
a. Feeds all validated insights to LLM
b. LLM generates recommendations with related_insight_ids

12. Agent saves DiscoveryResult to MongoDB

13. Agent updates run status to "completed" (or "failed")

14. Dashboard polls for status, shows completed results

Prompt Flow

Domain Pack provides template files (.md)

Project-level overrides stored in MongoDB (editable via dashboard)

Agent loads prompts (project overrides take priority)

Agent substitutes template variables:
{{PROFILE}} → JSON-encoded project profile
{{PREVIOUS_CONTEXT}} → Previous discoveries + feedback
{{SCHEMA_INFO}} → Discovered table schemas
{{DATASET}} → Dataset names
{{FILTER}} → WHERE clause for multi-tenant
{{QUERY_RESULTS}} → Exploration query results (per area)
...

Rendered prompt sent to LLM

See Prompts for the full variable reference.

Deployment Models

Local Development

Dashboard (npm run dev)  →  API (go run .)  →  Agent (subprocess)

MongoDB (Docker)

Docker Compose

Dashboard (container)  →  API (container)  →  Agent (subprocess inside API container)

MongoDB (container)

Kubernetes (Production)

Dashboard (Deployment)  →  API (Deployment)  →  Agent (K8s Job per discovery)

MongoDB (StatefulSet or external)

In Kubernetes mode (RUNNER_MODE=kubernetes), the API creates a K8s Job for each discovery run instead of spawning a subprocess. The agent runs as an isolated container with configurable CPU/memory limits.

Security Model

v0.1.0 (Current)

  • No authentication — Designed for internal/single-user deployment
  • API not publicly exposed — Dashboard proxies all requests
  • Secrets encrypted at rest — AES-256-GCM when using MongoDB provider with SECRET_ENCRYPTION_KEY
  • Warehouse read-only — Agent only executes SELECT queries
  • Per-project isolation — Each project has its own secrets, prompts, discoveries

Future

  • Authentication (OAuth2 / Auth0)
  • Multi-user RBAC
  • API key authentication for external integrations