Skip to main content

Data Models Reference

Version: 0.1.0

This page documents the core data structures used across DecisionBox.

DiscoveryResult

A complete discovery run output. Stored in the discoveries MongoDB collection.

FieldTypeDescription
idstringMongoDB ObjectID
project_idstringProject that owns this discovery
domainstringDomain (e.g., gaming)
categorystringCategory (e.g., match3)
run_typestringfull (all areas), partial (some areas or some failed), failed (all areas failed)
areas_requestedstring[]Area IDs requested (empty for full run)
discovery_datetimestampWhen the discovery ran
total_stepsintNumber of exploration steps executed
durationint64Duration in nanoseconds
insightsInsight[]Discovered patterns
recommendationsRecommendation[]Actionable advice
summarySummaryAggregate stats
exploration_logExplorationStep[]Every SQL query + AI reasoning
analysis_logAnalysisStep[]Full LLM dialog per analysis area
recommendation_logRecommendationStepFull LLM dialog for recommendations
validation_logValidationResult[]Verification queries + results
created_attimestampWhen result was saved

Insight

A discovered pattern or finding. Generated by the analysis phase.

FieldTypeDescription
idstringDeterministic ID: {area}-{index} (e.g., churn-1, monetization-3). Auto-generated if LLM omits it.
analysis_areastringWhich area found this (e.g., churn, levels)
namestringSpecific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return")
descriptionstringDetailed description with exact numbers and percentages
severitystringcritical, high, medium, or low
affected_countintNumber of affected users (COUNT DISTINCT user_id)
risk_scorefloat640.0 to 1.0 risk assessment
confidencefloat640.0 to 1.0 confidence level
metricsmapFlexible key-value metrics (e.g., {"churn_rate": 0.67, "avg_sessions": 3.2})
indicatorsstring[]Specific metric indicators (e.g., "Session drop: 12min → 4min")
target_segmentstringDescription of affected user segment
source_stepsint[]Exploration step numbers that support this insight
validationInsightValidationWarehouse verification result (if validated)
discovered_attimestampWhen this insight was generated

InsightValidation

Attached to an insight after warehouse verification.

FieldTypeDescription
statusstringconfirmed, adjusted, rejected, or error
original_countintCount claimed by the AI
verified_countintCount verified from the warehouse
reasoningstringExplanation of the validation result
validated_attimestampWhen validation was performed

Recommendation

An actionable suggestion based on discovered insights.

FieldTypeDescription
idstringRecommendation ID
categorystringCategory: churn, engagement, monetization, difficulty
titlestringSpecific action title
descriptionstringDetailed explanation with numbers
priorityint1 (critical) to 5 (optional). P1 = highest priority.
target_segmentstringExact segment criteria
segment_sizeintNumber of users in the segment
expected_impactImpactExpected improvement
actionsstring[]Numbered implementation steps
related_insight_idsstring[]IDs of insights this recommendation addresses (e.g., ["churn-1", "levels-2"])
confidencefloat640.0 to 1.0 confidence

Impact

Expected impact of a recommendation.

FieldTypeDescription
metricstringWhat metric improves (e.g., retention_rate, revenue)
estimated_improvementstringExpected improvement (e.g., "+15-20%", "+$4,975/month")
reasoningstringWhy this improvement is expected

Summary

Aggregate stats for a discovery run.

FieldTypeDescription
total_insightsintNumber of insights generated
total_recommendationsintNumber of recommendations generated
queries_executedintNumber of SQL queries executed
errorsstring[]Error messages from failed analysis areas (if any)

ExplorationStep

One step in the autonomous exploration phase. Represents a single LLM call + SQL query.

FieldTypeDescription
stepintStep number (1-based)
timestamptimestampWhen this step ran
actionstringAlways query_data
thinkingstringAI's reasoning for this query
query_purposestringShort description of query intent
querystringThe SQL query executed
row_countintNumber of rows returned
execution_time_msint64Query execution time in milliseconds
errorstringError message if query failed
fixedboolTrue if the query was auto-fixed after a SQL error

AnalysisStep

Full LLM dialog for one analysis area. Captures the complete prompt and response.

FieldTypeDescription
area_idstringAnalysis area ID (e.g., churn)
area_namestringDisplay name (e.g., Churn Risks)
run_attimestampWhen this analysis ran
relevant_queriesintNumber of exploration queries used as context
tokens_inintInput tokens consumed
tokens_outintOutput tokens generated
duration_msint64LLM call duration in milliseconds
insight_countintNumber of insights extracted
errorstringError message if analysis failed

ValidationResult

Warehouse verification of an insight's claims.

FieldTypeDescription
insight_idstringID of the validated insight
analysis_areastringArea this insight belongs to
claimed_countintCount claimed by the AI
verified_countintCount verified from the warehouse
statusstringconfirmed, adjusted, rejected, error
reasoningstringExplanation of the result
querystringThe verification SQL query
validated_attimestampWhen validation was performed

DiscoveryRunStatus

Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.

FieldTypeDescription
idstringRun ID
project_idstringProject being discovered
statusstringpending, running, completed, failed, cancelled
phasestringCurrent phase: init, schema_discovery, exploration, analysis, validation, recommendations, saving, complete
phase_detailstringHuman-readable phase description
progressint0 to 100 percentage
started_attimestampWhen the run started
updated_attimestampLast status update
completed_attimestampWhen the run finished (null if running)
errorstringError message (if failed)
stepsRunStep[]Live step feed
total_queriesintTotal SQL queries executed
successful_queriesintQueries that returned results
failed_queriesintQueries that errored
insights_foundintInsights generated so far

RunStep

One step in the live progress feed.

FieldTypeDescription
phasestringWhich phase this step belongs to
step_numintStep number
timestamptimestampWhen this step occurred
typestringquery, insight, analysis, validation, recommendation, error
messagestringStep description
llm_thinkingstringAI's reasoning text
querystringSQL query (if type=query)
query_resultstringQuery result summary
row_countintRows returned
query_time_msintQuery execution time
query_fixedboolWhether query was auto-fixed
insight_namestringInsight name (if type=insight)
insight_severitystringInsight severity (if type=insight)
errorstringError message (if type=error)

Feedback

User feedback on insights, recommendations, or exploration steps.

FieldTypeDescription
idstringFeedback ID
project_idstringProject ID
discovery_idstringDiscovery run ID
target_typestringinsight, recommendation, exploration_step
target_idstringID of the rated item
ratingstringlike or dislike
commentstringOptional comment (typically with dislikes)
created_attimestampWhen feedback was submitted

Project

Project configuration. Stored in projects collection.

FieldTypeDescription
idstringMongoDB ObjectID
namestringProject name
descriptionstringProject description
domainstringDomain (e.g., gaming)
categorystringCategory (e.g., match3)
warehouseWarehouseConfigData warehouse configuration
llmLLMConfigLLM provider configuration
scheduleScheduleConfigDiscovery schedule
profilemapDomain-specific profile (from JSON Schema form)
promptsProjectPromptsPer-project prompt overrides
statusstringProject status
last_run_attimestampWhen the last discovery ran
last_run_statusstringLast run result
created_attimestampWhen the project was created
updated_attimestampLast update

WarehouseConfig

FieldTypeDescription
providerstringProvider ID: bigquery, redshift
project_idstringGCP project ID (BigQuery)
datasetsstring[]Dataset/schema names
locationstringData location
filter_fieldstringMulti-tenant filter column
filter_valuestringMulti-tenant filter value
configmapProvider-specific key-value config

LLMConfig

FieldTypeDescription
providerstringProvider ID: claude, openai, ollama, vertex-ai, bedrock
modelstringModel identifier (free text)
configmapProvider-specific key-value config (e.g., project_id, location for Vertex AI)

ScheduleConfig

FieldTypeDescription
enabledboolWhether automatic discovery is enabled
cron_exprstringCron expression (e.g., 0 2 * * * = daily at 2 AM)
max_stepsintMax exploration steps for scheduled runs

Next Steps