Skip to main content

Production Considerations

Version: 0.1.0

Recommendations for running DecisionBox in production.

Security

Secret Encryption

Always set SECRET_ENCRYPTION_KEY in production:

export SECRET_ENCRYPTION_KEY=$(openssl rand -base64 32)

Without it, LLM API keys are stored in plaintext in MongoDB.

API Access

The API has no authentication in v0.1.0. It should not be exposed to the internet:

  • Docker Compose: Only expose the dashboard port (3000). The API port (8080) should not be mapped to the host or should be restricted to the Docker network.
  • Kubernetes: The API service should be ClusterIP (internal only). Only the dashboard needs an ingress.

The dashboard proxies all /api/* requests to the API server-side. Users never talk to the API directly.

Cloud Secret Providers

For production, use a cloud secret provider instead of MongoDB:

# GCP
SECRET_PROVIDER=gcp
SECRET_GCP_PROJECT_ID=my-project

# AWS
SECRET_PROVIDER=aws
SECRET_AWS_REGION=us-east-1

Cloud providers handle encryption, access control, audit logging, and key rotation.

Network

  • MongoDB should not be accessible from the internet
  • Use MongoDB authentication (username/password or x509)
  • Enable TLS for MongoDB connections in production

Scaling

Current Limitations

  • Single agent per discovery: Each discovery run spawns one agent process. Parallel projects work, but parallel runs within a project are blocked (409 Conflict).
  • No horizontal API scaling: The API stores run state in MongoDB. Multiple API replicas work for reads but may have race conditions for run management. Use a single API replica for now.
  • Dashboard is stateless: Can be scaled horizontally with multiple replicas behind a load balancer.

Resource Sizing

ComponentSmall (dev)Medium (production)Large (heavy use)
API256Mi / 0.5 CPU512Mi / 1 CPU1Gi / 2 CPU
Agent256Mi / 0.5 CPU1Gi / 2 CPU2Gi / 4 CPU
Dashboard128Mi / 0.25 CPU256Mi / 0.5 CPU512Mi / 1 CPU
MongoDB512Mi / 1 CPU2Gi / 2 CPU8Gi / 4 CPU

Agent resource usage depends on:

  • Number of exploration steps
  • Size of warehouse query results
  • LLM response sizes

MongoDB

For production MongoDB:

  • Use MongoDB Atlas (managed) or a MongoDB operator on Kubernetes
  • Enable replica set for availability
  • Set appropriate WiredTiger cache size
  • Index maintenance is automatic (API creates indexes on startup)

Monitoring

Health Endpoints

EndpointPurposeFrequency
GET /healthLiveness (API process alive)K8s: every 10s
GET /health/readyReadiness (MongoDB connected)K8s: every 10s
GET /health on :3000Dashboard + API connectivityK8s: every 30s

Logs

Both API and agent write structured JSON logs to stderr in production mode (ENV=prod):

{"level":"info","ts":"2026-03-14T10:30:00.000Z","msg":"Discovery completed","service":"decisionbox-agent","project_id":"507f...","insights":7,"duration":"5m23s"}

Collect with any log aggregator (Loki, CloudWatch, Cloud Logging, Datadog).

Key Metrics to Watch

MetricSourceAlert On
Discovery run durationAgent logs> 30 minutes
Discovery failuresdiscovery_runs.status = "failed"Any failure
LLM timeoutsAgent logs (ERROR level)Repeated timeouts
Warehouse query errorsAgent logs> 10% failure rate
MongoDB connection errorsAPI logsAny connection error

Backup and Recovery

MongoDB

# Backup
mongodump --uri="$MONGODB_URI" --db=decisionbox --out=./backup

# Restore
mongorestore --uri="$MONGODB_URI" --db=decisionbox ./backup/decisionbox

What to Back Up

CollectionPrioritySize
projectsCriticalSmall
secretsCriticalSmall
discoveriesImportantLarge (grows over time)
feedbackImportantSmall
discovery_runsLow (ephemeral)Medium
discovery_debug_logsLow (TTL: 30 days)Large
pricingLow (auto-seeded)Tiny

Maintenance

Cleaning Up Old Data

Discovery debug logs have a 30-day TTL index and are cleaned up automatically by MongoDB.

Discovery results accumulate indefinitely. Consider periodically archiving or deleting old discoveries:

// Delete discoveries older than 90 days
db.discoveries.deleteMany({
created_at: { $lt: new Date(Date.now() - 90*24*60*60*1000) }
})

Updating

  1. Pull new images (or rebuild from source)
  2. Stop services
  3. Start services — the API re-creates indexes automatically (idempotent)
  4. No database migrations needed — MongoDB is schema-flexible

Next Steps