Version: 0.1.0
This page documents the core data structures used across DecisionBox.
DiscoveryResult
A complete discovery run output. Stored in the discoveries MongoDB collection.
| Field | Type | Description |
|---|
id | string | MongoDB ObjectID |
project_id | string | Project that owns this discovery |
domain | string | Domain (e.g., gaming) |
category | string | Category (e.g., match3) |
run_type | string | full (all areas), partial (some areas or some failed), failed (all areas failed) |
areas_requested | string[] | Area IDs requested (empty for full run) |
discovery_date | timestamp | When the discovery ran |
total_steps | int | Number of exploration steps executed |
duration | int64 | Duration in nanoseconds |
insights | Insight[] | Discovered patterns |
recommendations | Recommendation[] | Actionable advice |
summary | Summary | Aggregate stats |
exploration_log | ExplorationStep[] | Every SQL query + AI reasoning |
analysis_log | AnalysisStep[] | Full LLM dialog per analysis area |
recommendation_log | RecommendationStep | Full LLM dialog for recommendations |
validation_log | ValidationResult[] | Verification queries + results |
created_at | timestamp | When result was saved |
Insight
A discovered pattern or finding. Generated by the analysis phase.
| Field | Type | Description |
|---|
id | string | Deterministic ID: {area}-{index} (e.g., churn-1, monetization-3). Auto-generated if LLM omits it. |
analysis_area | string | Which area found this (e.g., churn, levels) |
name | string | Specific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return") |
description | string | Detailed description with exact numbers and percentages |
severity | string | critical, high, medium, or low |
affected_count | int | Number of affected users (COUNT DISTINCT user_id) |
risk_score | float64 | 0.0 to 1.0 risk assessment |
confidence | float64 | 0.0 to 1.0 confidence level |
metrics | map | Flexible key-value metrics (e.g., {"churn_rate": 0.67, "avg_sessions": 3.2}) |
indicators | string[] | Specific metric indicators (e.g., "Session drop: 12min → 4min") |
target_segment | string | Description of affected user segment |
source_steps | int[] | Exploration step numbers that support this insight |
validation | InsightValidation | Warehouse verification result (if validated) |
discovered_at | timestamp | When this insight was generated |
InsightValidation
Attached to an insight after warehouse verification.
| Field | Type | Description |
|---|
status | string | confirmed, adjusted, rejected, or error |
original_count | int | Count claimed by the AI |
verified_count | int | Count verified from the warehouse |
reasoning | string | Explanation of the validation result |
validated_at | timestamp | When validation was performed |
Recommendation
An actionable suggestion based on discovered insights.
| Field | Type | Description |
|---|
id | string | Recommendation ID |
category | string | Category: churn, engagement, monetization, difficulty |
title | string | Specific action title |
description | string | Detailed explanation with numbers |
priority | int | 1 (critical) to 5 (optional). P1 = highest priority. |
target_segment | string | Exact segment criteria |
segment_size | int | Number of users in the segment |
expected_impact | Impact | Expected improvement |
actions | string[] | Numbered implementation steps |
related_insight_ids | string[] | IDs of insights this recommendation addresses (e.g., ["churn-1", "levels-2"]) |
confidence | float64 | 0.0 to 1.0 confidence |
Impact
Expected impact of a recommendation.
| Field | Type | Description |
|---|
metric | string | What metric improves (e.g., retention_rate, revenue) |
estimated_improvement | string | Expected improvement (e.g., "+15-20%", "+$4,975/month") |
reasoning | string | Why this improvement is expected |
Summary
Aggregate stats for a discovery run.
| Field | Type | Description |
|---|
total_insights | int | Number of insights generated |
total_recommendations | int | Number of recommendations generated |
queries_executed | int | Number of SQL queries executed |
errors | string[] | Error messages from failed analysis areas (if any) |
ExplorationStep
One step in the autonomous exploration phase. Represents a single LLM call + SQL query.
| Field | Type | Description |
|---|
step | int | Step number (1-based) |
timestamp | timestamp | When this step ran |
action | string | Always query_data |
thinking | string | AI's reasoning for this query |
query_purpose | string | Short description of query intent |
query | string | The SQL query executed |
row_count | int | Number of rows returned |
execution_time_ms | int64 | Query execution time in milliseconds |
error | string | Error message if query failed |
fixed | bool | True if the query was auto-fixed after a SQL error |
AnalysisStep
Full LLM dialog for one analysis area. Captures the complete prompt and response.
| Field | Type | Description |
|---|
area_id | string | Analysis area ID (e.g., churn) |
area_name | string | Display name (e.g., Churn Risks) |
run_at | timestamp | When this analysis ran |
relevant_queries | int | Number of exploration queries used as context |
tokens_in | int | Input tokens consumed |
tokens_out | int | Output tokens generated |
duration_ms | int64 | LLM call duration in milliseconds |
insight_count | int | Number of insights extracted |
error | string | Error message if analysis failed |
ValidationResult
Warehouse verification of an insight's claims.
| Field | Type | Description |
|---|
insight_id | string | ID of the validated insight |
analysis_area | string | Area this insight belongs to |
claimed_count | int | Count claimed by the AI |
verified_count | int | Count verified from the warehouse |
status | string | confirmed, adjusted, rejected, error |
reasoning | string | Explanation of the result |
query | string | The verification SQL query |
validated_at | timestamp | When validation was performed |
DiscoveryRunStatus
Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.
| Field | Type | Description |
|---|
id | string | Run ID |
project_id | string | Project being discovered |
status | string | pending, running, completed, failed, cancelled |
phase | string | Current phase: init, schema_discovery, exploration, analysis, validation, recommendations, saving, complete |
phase_detail | string | Human-readable phase description |
progress | int | 0 to 100 percentage |
started_at | timestamp | When the run started |
updated_at | timestamp | Last status update |
completed_at | timestamp | When the run finished (null if running) |
error | string | Error message (if failed) |
steps | RunStep[] | Live step feed |
total_queries | int | Total SQL queries executed |
successful_queries | int | Queries that returned results |
failed_queries | int | Queries that errored |
insights_found | int | Insights generated so far |
RunStep
One step in the live progress feed.
| Field | Type | Description |
|---|
phase | string | Which phase this step belongs to |
step_num | int | Step number |
timestamp | timestamp | When this step occurred |
type | string | query, insight, analysis, validation, recommendation, error |
message | string | Step description |
llm_thinking | string | AI's reasoning text |
query | string | SQL query (if type=query) |
query_result | string | Query result summary |
row_count | int | Rows returned |
query_time_ms | int | Query execution time |
query_fixed | bool | Whether query was auto-fixed |
insight_name | string | Insight name (if type=insight) |
insight_severity | string | Insight severity (if type=insight) |
error | string | Error message (if type=error) |
Feedback
User feedback on insights, recommendations, or exploration steps.
| Field | Type | Description |
|---|
id | string | Feedback ID |
project_id | string | Project ID |
discovery_id | string | Discovery run ID |
target_type | string | insight, recommendation, exploration_step |
target_id | string | ID of the rated item |
rating | string | like or dislike |
comment | string | Optional comment (typically with dislikes) |
created_at | timestamp | When feedback was submitted |
Project
Project configuration. Stored in projects collection.
| Field | Type | Description |
|---|
id | string | MongoDB ObjectID |
name | string | Project name |
description | string | Project description |
domain | string | Domain (e.g., gaming) |
category | string | Category (e.g., match3) |
warehouse | WarehouseConfig | Data warehouse configuration |
llm | LLMConfig | LLM provider configuration |
schedule | ScheduleConfig | Discovery schedule |
profile | map | Domain-specific profile (from JSON Schema form) |
prompts | ProjectPrompts | Per-project prompt overrides |
status | string | Project status |
last_run_at | timestamp | When the last discovery ran |
last_run_status | string | Last run result |
created_at | timestamp | When the project was created |
updated_at | timestamp | Last update |
WarehouseConfig
| Field | Type | Description |
|---|
provider | string | Provider ID: bigquery, redshift |
project_id | string | GCP project ID (BigQuery) |
datasets | string[] | Dataset/schema names |
location | string | Data location |
filter_field | string | Multi-tenant filter column |
filter_value | string | Multi-tenant filter value |
config | map | Provider-specific key-value config |
LLMConfig
| Field | Type | Description |
|---|
provider | string | Provider ID: claude, openai, ollama, vertex-ai, bedrock |
model | string | Model identifier (free text) |
config | map | Provider-specific key-value config (e.g., project_id, location for Vertex AI) |
ScheduleConfig
| Field | Type | Description |
|---|
enabled | bool | Whether automatic discovery is enabled |
cron_expr | string | Cron expression (e.g., 0 2 * * * = daily at 2 AM) |
max_steps | int | Max exploration steps for scheduled runs |
Next Steps