Data Models Reference

Version: 0.1.0

This page documents the core data structures used across DecisionBox.

DiscoveryResult

A complete discovery run output. Stored in the discoveries MongoDB collection.

Field	Type	Description
`id`	string	MongoDB ObjectID
`project_id`	string	Project that owns this discovery
`domain`	string	Domain (e.g., `gaming`)
`category`	string	Category (e.g., `match3`)
`run_type`	string	`full` (all areas), `partial` (some areas or some failed), `failed` (all areas failed)
`areas_requested`	string[]	Area IDs requested (empty for full run)
`discovery_date`	timestamp	When the discovery ran
`total_steps`	int	Number of exploration steps executed
`duration`	int64	Duration in nanoseconds
`insights`	Insight[]	Discovered patterns
`recommendations`	Recommendation[]	Actionable advice
`summary`	Summary	Aggregate stats
`exploration_log`	ExplorationStep[]	Every SQL query + AI reasoning
`analysis_log`	AnalysisStep[]	Full LLM dialog per analysis area
`recommendation_log`	RecommendationStep	Full LLM dialog for recommendations
`validation_log`	ValidationResult[]	Verification queries + results
`created_at`	timestamp	When result was saved

Insight

A discovered pattern or finding. Generated by the analysis phase.

Field	Type	Description
`id`	string	Deterministic ID: `{area}-{index}` (e.g., `churn-1`, `monetization-3`). Auto-generated if LLM omits it.
`analysis_area`	string	Which area found this (e.g., `churn`, `levels`)
`name`	string	Specific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return")
`description`	string	Detailed description with exact numbers and percentages
`severity`	string	`critical`, `high`, `medium`, or `low`
`affected_count`	int	Number of affected users (COUNT DISTINCT user_id)
`risk_score`	float64	0.0 to 1.0 risk assessment
`confidence`	float64	0.0 to 1.0 confidence level
`metrics`	map	Flexible key-value metrics (e.g., `{"churn_rate": 0.67, "avg_sessions": 3.2}`)
`indicators`	string[]	Specific metric indicators (e.g., "Session drop: 12min → 4min")
`target_segment`	string	Description of affected user segment
`source_steps`	int[]	Exploration step numbers that support this insight
`validation`	InsightValidation	Warehouse verification result (if validated)
`discovered_at`	timestamp	When this insight was generated

InsightValidation

Attached to an insight after warehouse verification.

Field	Type	Description
`status`	string	`confirmed`, `adjusted`, `rejected`, or `error`
`original_count`	int	Count claimed by the AI
`verified_count`	int	Count verified from the warehouse
`reasoning`	string	Explanation of the validation result
`validated_at`	timestamp	When validation was performed

Recommendation

An actionable suggestion based on discovered insights.

Field	Type	Description
`id`	string	Recommendation ID
`category`	string	Category: `churn`, `engagement`, `monetization`, `difficulty`
`title`	string	Specific action title
`description`	string	Detailed explanation with numbers
`priority`	int	1 (critical) to 5 (optional). P1 = highest priority.
`target_segment`	string	Exact segment criteria
`segment_size`	int	Number of users in the segment
`expected_impact`	Impact	Expected improvement
`actions`	string[]	Numbered implementation steps
`related_insight_ids`	string[]	IDs of insights this recommendation addresses (e.g., `["churn-1", "levels-2"]`)
`confidence`	float64	0.0 to 1.0 confidence

Impact

Expected impact of a recommendation.

Field	Type	Description
`metric`	string	What metric improves (e.g., `retention_rate`, `revenue`)
`estimated_improvement`	string	Expected improvement (e.g., "+15-20%", "+$4,975/month")
`reasoning`	string	Why this improvement is expected

Summary

Aggregate stats for a discovery run.

Field	Type	Description
`total_insights`	int	Number of insights generated
`total_recommendations`	int	Number of recommendations generated
`queries_executed`	int	Number of SQL queries executed
`errors`	string[]	Error messages from failed analysis areas (if any)

ExplorationStep

One step in the autonomous exploration phase. Represents a single LLM call + SQL query.

Field	Type	Description
`step`	int	Step number (1-based)
`timestamp`	timestamp	When this step ran
`action`	string	Always `query_data`
`thinking`	string	AI's reasoning for this query
`query_purpose`	string	Short description of query intent
`query`	string	The SQL query executed
`row_count`	int	Number of rows returned
`execution_time_ms`	int64	Query execution time in milliseconds
`error`	string	Error message if query failed
`fixed`	bool	True if the query was auto-fixed after a SQL error

AnalysisStep

Full LLM dialog for one analysis area. Captures the complete prompt and response.

Field	Type	Description
`area_id`	string	Analysis area ID (e.g., `churn`)
`area_name`	string	Display name (e.g., `Churn Risks`)
`run_at`	timestamp	When this analysis ran
`relevant_queries`	int	Number of exploration queries used as context
`tokens_in`	int	Input tokens consumed
`tokens_out`	int	Output tokens generated
`duration_ms`	int64	LLM call duration in milliseconds
`insight_count`	int	Number of insights extracted
`error`	string	Error message if analysis failed

ValidationResult

Warehouse verification of an insight's claims.

Field	Type	Description
`insight_id`	string	ID of the validated insight
`analysis_area`	string	Area this insight belongs to
`claimed_count`	int	Count claimed by the AI
`verified_count`	int	Count verified from the warehouse
`status`	string	`confirmed`, `adjusted`, `rejected`, `error`
`reasoning`	string	Explanation of the result
`query`	string	The verification SQL query
`validated_at`	timestamp	When validation was performed

DiscoveryRunStatus

Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.

Field	Type	Description
`id`	string	Run ID
`project_id`	string	Project being discovered
`status`	string	`pending`, `running`, `completed`, `failed`, `cancelled`
`phase`	string	Current phase: `init`, `schema_discovery`, `exploration`, `analysis`, `validation`, `recommendations`, `saving`, `complete`
`phase_detail`	string	Human-readable phase description
`progress`	int	0 to 100 percentage
`started_at`	timestamp	When the run started
`updated_at`	timestamp	Last status update
`completed_at`	timestamp	When the run finished (null if running)
`error`	string	Error message (if failed)
`steps`	RunStep[]	Live step feed
`total_queries`	int	Total SQL queries executed
`successful_queries`	int	Queries that returned results
`failed_queries`	int	Queries that errored
`insights_found`	int	Insights generated so far

RunStep

One step in the live progress feed.

Field	Type	Description
`phase`	string	Which phase this step belongs to
`step_num`	int	Step number
`timestamp`	timestamp	When this step occurred
`type`	string	`query`, `insight`, `analysis`, `validation`, `recommendation`, `error`
`message`	string	Step description
`llm_thinking`	string	AI's reasoning text
`query`	string	SQL query (if type=query)
`query_result`	string	Query result summary
`row_count`	int	Rows returned
`query_time_ms`	int	Query execution time
`query_fixed`	bool	Whether query was auto-fixed
`insight_name`	string	Insight name (if type=insight)
`insight_severity`	string	Insight severity (if type=insight)
`error`	string	Error message (if type=error)

Feedback

User feedback on insights, recommendations, or exploration steps.

Field	Type	Description
`id`	string	Feedback ID
`project_id`	string	Project ID
`discovery_id`	string	Discovery run ID
`target_type`	string	`insight`, `recommendation`, `exploration_step`
`target_id`	string	ID of the rated item
`rating`	string	`like` or `dislike`
`comment`	string	Optional comment (typically with dislikes)
`created_at`	timestamp	When feedback was submitted

Project

Project configuration. Stored in projects collection.

Field	Type	Description
`id`	string	MongoDB ObjectID
`name`	string	Project name
`description`	string	Project description
`domain`	string	Domain (e.g., `gaming`)
`category`	string	Category (e.g., `match3`)
`warehouse`	WarehouseConfig	Data warehouse configuration
`llm`	LLMConfig	LLM provider configuration
`schedule`	ScheduleConfig	Discovery schedule
`profile`	map	Domain-specific profile (from JSON Schema form)
`prompts`	ProjectPrompts	Per-project prompt overrides
`status`	string	Project status
`last_run_at`	timestamp	When the last discovery ran
`last_run_status`	string	Last run result
`created_at`	timestamp	When the project was created
`updated_at`	timestamp	Last update

WarehouseConfig

Field	Type	Description
`provider`	string	Provider ID: `bigquery`, `redshift`
`project_id`	string	GCP project ID (BigQuery)
`datasets`	string[]	Dataset/schema names
`location`	string	Data location
`filter_field`	string	Multi-tenant filter column
`filter_value`	string	Multi-tenant filter value
`config`	map	Provider-specific key-value config

LLMConfig

Field	Type	Description
`provider`	string	Provider ID: `claude`, `openai`, `ollama`, `vertex-ai`, `bedrock`
`model`	string	Model identifier (free text)
`config`	map	Provider-specific key-value config (e.g., `project_id`, `location` for Vertex AI)

ScheduleConfig

Field	Type	Description
`enabled`	bool	Whether automatic discovery is enabled
`cron_expr`	string	Cron expression (e.g., `0 2 * * *` = daily at 2 AM)
`max_steps`	int	Max exploration steps for scheduled runs

Next Steps

API Reference — Endpoints that return these models
Configuration Reference — Environment variables

DiscoveryResult​

Insight​

InsightValidation​

Recommendation​

Impact​

Summary​

ExplorationStep​

AnalysisStep​

ValidationResult​

DiscoveryRunStatus​

RunStep​

Feedback​

Project​

WarehouseConfig​

LLMConfig​

ScheduleConfig​

Next Steps​