Skip to main content
Version: 0.3.0

Data Models Reference

Version: 0.1.0

This page documents the core data structures used across DecisionBox.

DiscoveryResult

A complete discovery run output. Stored in the discoveries MongoDB collection.

FieldTypeDescription
idstringMongoDB ObjectID
project_idstringProject that owns this discovery
domainstringDomain (e.g., gaming, social)
categorystringCategory (e.g., match3, idle, casual, content_sharing)
run_typestringfull (all areas), partial (some areas or some failed), failed (all areas failed)
areas_requestedstring[]Area IDs requested (empty for full run)
discovery_datetimestampWhen the discovery ran
total_stepsintNumber of exploration steps executed
durationint64Duration in nanoseconds
insightsInsight[]Discovered patterns
recommendationsRecommendation[]Actionable advice
summarySummaryAggregate stats
exploration_logExplorationStep[]Every SQL query + AI reasoning
analysis_logAnalysisStep[]Full LLM dialog per analysis area
recommendation_logRecommendationStepFull LLM dialog for recommendations
validation_logValidationResult[]Verification queries + results
created_attimestampWhen result was saved

Insight

A discovered pattern or finding. Generated by the analysis phase.

FieldTypeDescription
idstringDeterministic ID: {area}-{index} (e.g., churn-1, monetization-3). Auto-generated if LLM omits it.
analysis_areastringWhich area found this (e.g., churn, levels)
namestringSpecific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return")
descriptionstringDetailed description with exact numbers and percentages
severitystringcritical, high, medium, or low
affected_countintNumber of affected users (COUNT DISTINCT user_id)
risk_scorefloat640.0 to 1.0 risk assessment
confidencefloat640.0 to 1.0 confidence level
metricsmapFlexible key-value metrics (e.g., {"churn_rate": 0.67, "avg_sessions": 3.2})
indicatorsstring[]Specific metric indicators (e.g., "Session drop: 12min → 4min")
target_segmentstringDescription of affected user segment
source_stepsint[]Exploration step numbers that support this insight
validationInsightValidationWarehouse verification result (if validated)
discovered_attimestampWhen this insight was generated

InsightValidation

Attached to an insight after warehouse verification.

FieldTypeDescription
statusstringconfirmed, adjusted, rejected, or error
original_countintCount claimed by the AI
verified_countintCount verified from the warehouse
reasoningstringExplanation of the validation result
validated_attimestampWhen validation was performed

Recommendation

An actionable suggestion based on discovered insights.

FieldTypeDescription
idstringRecommendation ID
categorystringCategory: churn, engagement, monetization, difficulty
titlestringSpecific action title
descriptionstringDetailed explanation with numbers
priorityint1 (critical) to 5 (optional). P1 = highest priority.
target_segmentstringExact segment criteria
segment_sizeintNumber of users in the segment
expected_impactImpactExpected improvement
actionsstring[]Numbered implementation steps
related_insight_idsstring[]IDs of insights this recommendation addresses (e.g., ["churn-1", "levels-2"])
confidencefloat640.0 to 1.0 confidence

Impact

Expected impact of a recommendation.

FieldTypeDescription
metricstringWhat metric improves (e.g., retention_rate, revenue)
estimated_improvementstringExpected improvement (e.g., "+15-20%", "+$4,975/month")
reasoningstringWhy this improvement is expected

Summary

Aggregate stats for a discovery run.

FieldTypeDescription
total_insightsintNumber of insights generated
total_recommendationsintNumber of recommendations generated
queries_executedintNumber of SQL queries executed
errorsstring[]Error messages from failed analysis areas (if any)

ExplorationStep

One step in the autonomous exploration phase. Represents a single LLM call + SQL query.

FieldTypeDescription
stepintStep number (1-based)
timestamptimestampWhen this step ran
actionstringAlways query_data
thinkingstringAI's reasoning for this query
query_purposestringShort description of query intent
querystringThe SQL query executed
row_countintNumber of rows returned
execution_time_msint64Query execution time in milliseconds
errorstringError message if query failed
fixedboolTrue if the query was auto-fixed after a SQL error

AnalysisStep

Full LLM dialog for one analysis area. Captures the complete prompt and response.

FieldTypeDescription
area_idstringAnalysis area ID (e.g., churn)
area_namestringDisplay name (e.g., Churn Risks)
run_attimestampWhen this analysis ran
relevant_queriesintNumber of exploration queries used as context
tokens_inintInput tokens consumed
tokens_outintOutput tokens generated
duration_msint64LLM call duration in milliseconds
insight_countintNumber of insights extracted
errorstringError message if analysis failed

ValidationResult

Warehouse verification of an insight's claims.

FieldTypeDescription
insight_idstringID of the validated insight
analysis_areastringArea this insight belongs to
claimed_countintCount claimed by the AI
verified_countintCount verified from the warehouse
statusstringconfirmed, adjusted, rejected, error
reasoningstringExplanation of the result
querystringThe verification SQL query
validated_attimestampWhen validation was performed

DiscoveryRunStatus

Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.

FieldTypeDescription
idstringRun ID
project_idstringProject being discovered
statusstringpending, running, completed, failed, cancelled
phasestringCurrent phase: init, schema_discovery, exploration, analysis, validation, recommendations, saving, complete
phase_detailstringHuman-readable phase description
progressint0 to 100 percentage
started_attimestampWhen the run started
updated_attimestampLast status update
completed_attimestampWhen the run finished (null if running)
errorstringError message (if failed)
stepsRunStep[]Live step feed
total_queriesintTotal SQL queries executed
successful_queriesintQueries that returned results
failed_queriesintQueries that errored
insights_foundintInsights generated so far

RunStep

One step in the live progress feed.

FieldTypeDescription
phasestringWhich phase this step belongs to
step_numintStep number
timestamptimestampWhen this step occurred
typestringquery, insight, analysis, validation, recommendation, error
messagestringStep description
llm_thinkingstringAI's reasoning text
querystringSQL query (if type=query)
query_resultstringQuery result summary
row_countintRows returned
query_time_msintQuery execution time
query_fixedboolWhether query was auto-fixed
insight_namestringInsight name (if type=insight)
insight_severitystringInsight severity (if type=insight)
errorstringError message (if type=error)

Feedback

User feedback on insights, recommendations, or exploration steps.

FieldTypeDescription
idstringFeedback ID
project_idstringProject ID
discovery_idstringDiscovery run ID
target_typestringinsight, recommendation, exploration_step
target_idstringID of the rated item
ratingstringlike or dislike
commentstringOptional comment (typically with dislikes)
created_attimestampWhen feedback was submitted

Project

Project configuration. Stored in projects collection.

FieldTypeDescription
idstringMongoDB ObjectID
namestringProject name
descriptionstringProject description
domainstringDomain (e.g., gaming, social)
categorystringCategory (e.g., match3, idle, casual, content_sharing)
warehouseWarehouseConfigData warehouse configuration
llmLLMConfigLLM provider configuration
scheduleScheduleConfigDiscovery schedule
profilemapDomain-specific profile (from JSON Schema form)
promptsProjectPromptsPer-project prompt overrides
statusstringProject status
last_run_attimestampWhen the last discovery ran
last_run_statusstringLast run result
created_attimestampWhen the project was created
updated_attimestampLast update

WarehouseConfig

FieldTypeDescription
providerstringProvider ID: bigquery, redshift
project_idstringGCP project ID (BigQuery)
datasetsstring[]Dataset/schema names
locationstringData location
filter_fieldstringMulti-tenant filter column
filter_valuestringMulti-tenant filter value
configmapProvider-specific key-value config

LLMConfig

FieldTypeDescription
providerstringProvider ID: claude, openai, ollama, vertex-ai, bedrock
modelstringModel identifier (free text)
configmapProvider-specific key-value config (e.g., project_id, location for Vertex AI)

ScheduleConfig

FieldTypeDescription
enabledboolWhether automatic discovery is enabled
cron_exprstringCron expression (e.g., 0 2 * * * = daily at 2 AM)
max_stepsintMax exploration steps for scheduled runs

Next Steps