Prompt Variables Reference
Version: 0.4.0
Template variables in prompt files use the {{VARIABLE_NAME}} syntax. The agent replaces them with project-specific values at runtime before sending to the LLM.
All Variables
| Variable | Used In | Replaced With | Type |
|---|---|---|---|
{{PROFILE}} | base_context.md | JSON-encoded project profile | JSON string |
{{PREVIOUS_CONTEXT}} | base_context.md | Previous discoveries + user feedback | Multi-line text |
{{SCHEMA_INFO}} | exploration.md | Level-0 catalog only — one line per table (name, column count, row count, keyword hints). Per-table column lists and sample rows are fetched on demand by the LLM via lookup_schema / search_tables actions. | Plain text |
{{DATASET}} | exploration.md, analysis_*.md | Dataset/schema names | Comma-separated string |
{{FILTER}} | exploration.md | SQL WHERE clause | SQL fragment |
{{FILTER_CONTEXT}} | exploration.md | Human-readable filter description | Text |
{{FILTER_RULE}} | exploration.md | SQL construction rule for the filter | Text |
{{ANALYSIS_AREAS}} | exploration.md | List of analysis areas with descriptions | Multi-line text |
{{DIALECT}} | base_context.md, exploration.md, analysis_*.md, recommendations.md | Warehouse SQL dialect name returned by provider.SQLDialect() (e.g. "BigQuery Standard SQL", "Microsoft SQL Server T-SQL …", "PostgreSQL …"). | Plain text |
{{REF:tablename}} | any prompt | Dialect-correct fully-qualified table reference produced by provider.QuoteRef(refDataset, "tablename") — e.g. `events_prod`.`sessions` on BigQuery, [dbo].[sessions] on SQL Server, "public"."sessions" on PostgreSQL. The identifier must match [A-Za-z_][A-Za-z0-9_]*; placeholders that don't match (empty name, embedded dot, leading digit) are passed through unchanged so a malformed marker is visible in the rendered prompt. | Plain text |
{{TOTAL_QUERIES}} | analysis_*.md | Count of relevant exploration queries | Integer |
{{QUERY_RESULTS}} | analysis_*.md | Exploration query results for this area | JSON array |
{{DISCOVERY_DATE}} | recommendations.md | Current date (ISO format) | Date string |
{{INSIGHTS_SUMMARY}} | recommendations.md | Text summary of insight counts | Text |
{{INSIGHTS_DATA}} | recommendations.md | Full insight array with IDs | JSON array |
Detailed Reference
{{PROFILE}}
Source: project.profile from MongoDB
The project profile serialized as JSON. Contains domain-specific fields defined by the domain pack's profile schema.
Example value:
{
"basic_info": {
"genre": "puzzle",
"sub_genre": "match-3",
"platforms": ["iOS", "Android"],
"target_audience": "Adult Female 30-65+"
},
"gameplay": {
"core_mechanic": "match3",
"game_type": "level_based",
"session_type": "short",
"avg_session_duration_minutes": 6
},
"monetization": {
"model": "freemium",
"has_ads": true,
"has_iap": true
},
"boosters": [
{"name": "Magnet", "usage": "consumable", "starting_amount": 3, "can_purchase": true}
],
"kpis": {
"retention_d1_target": 40,
"arpu_target": 0.50
}
}
If profile is empty: Shows "No project profile configured. Provide general analysis.".
{{PREVIOUS_CONTEXT}}
Source: Last 5 discoveries + all feedback for the project
Built by the agent's buildPreviousContext() function. Contains:
- Discovery count and date: "This is discovery run #5. Last discovery: 2026-03-12."
- Previous insights: Names, areas, severity, dates — with dedup instruction
- Disliked insights: With user comments — "AVOID similar conclusions"
- Liked insights: "MONITOR for changes"
- Previous recommendations: "Don't repeat unless changed"
Example value:
This is discovery run #5. Last discovery: 2026-03-12.
### Previously Found Insights
These insights were already discovered. Do NOT repeat them unless the data has significantly changed.
- **Day 0-to-Day 1 Drop: 67% Never Return** [churn, critical] — 8298 affected (2026-03-11)
- **Level 11 Difficulty Cliff** [churn, high] — 642 affected (2026-03-12)
### User Feedback — Disliked Insights (AVOID)
- **Low engagement pattern** — user comment: "not relevant to our game"
### User Feedback — Liked Insights (MONITOR)
- **Day 0-to-Day 1 Drop: 67% Never Return**
### Previously Given Recommendations
Don't repeat these unless the situation has changed.
- P1: Reduce Level 11-15 Difficulty by 25% (difficulty)
If first run: Empty string (no previous context).
{{SCHEMA_INFO}}
Source: Schema renderer — Level-0 catalog only.
Compact catalog of every table the project indexed in Qdrant. One line
per table: name | Nc | Xk/M rows | keyword1, keyword2 | -> joins_to.
The catalog is sized to the project's prompt budget; archive-shaped tables
(*_2023, *_LOG, *_BKP, *_TMP, …) and the smallest by row count are
dropped first when the catalog exceeds the budget.
Example value:
analytics_data.users | 14c | 50K rows | users, accounts | -> sessions
analytics_data.sessions | 9c | 500K rows | sessions, activity | -> users, events
analytics_data.events | 22c | 4.2M rows | events, clickstream
This is the only schema content sent up-front. Column lists and sample rows are fetched on demand by the LLM during exploration:
lookup_schema: pulls L1 detail (columns + 3 sample rows) for up to 10 tables per call. Per-run budget: 30 lookup calls.search_tables: runs a semantic query against the per-project Qdrant index. Per-run budget: 30 search calls. TopK clamped to 30.
Both budgets and the per-call cap are surfaced to the LLM in the initial exploration message; see docs/architecture/agent-on-demand-schema.md for the architectural rationale (the previous "always send L1" model exhausted the Bedrock 1M-token context on long runs).
{{DATASET}}
Source: project.warehouse.datasets array
Comma-separated dataset names (BigQuery) or schema names (Redshift).
Example values:
- BigQuery:
"analytics_data, features_prod" - Redshift:
"public"
{{FILTER}}
Source: project.warehouse.filter_field and filter_value
SQL WHERE clause for multi-tenant data filtering.
Example value: "WHERE app_id = '68a42f378e3b227c8e41b0e5'"
If no filter configured: Empty string.
{{FILTER_CONTEXT}}
Source: Same as {{FILTER}}
Human-readable explanation of the filter for the AI.
Example value: "Data is filtered to app_id='68a42f378e3b227c8e41b0e5'. Always include this filter in your queries."
{{FILTER_RULE}}
Source: Same as {{FILTER}}
SQL construction rule.
Example value: "Always include: WHERE app_id = '68a42f378e3b227c8e41b0e5' in all queries."
{{DIALECT}}
Source: provider.SQLDialect() on the project's connected warehouse provider.
The SQL dialect description rendered into every prompt so the LLM emits dialect-correct SQL on the first generation rather than round-tripping through the SQL-fix LLM call. Mirrored into the insight validator's verification prompt via the **SQL Dialect**: %s line.
Example values:
- BigQuery:
"BigQuery Standard SQL" - PostgreSQL:
"PostgreSQL (ANSI SQL with extensions: DISTINCT ON, CTEs, window functions, JSON operators, array types)" - SQL Server:
"Microsoft SQL Server T-SQL (ANSI SQL with extensions: TOP, OFFSET/FETCH, CTEs, window functions, PIVOT/UNPIVOT, MERGE, CROSS APPLY, OUTER APPLY, XML/JSON functions)"
{{REF:tablename}}
Source: provider.QuoteRef(refDataset, "tablename") on the project's connected warehouse provider, where refDataset is the first dataset in project.warehouse.datasets.
A dialect-correct fully-qualified table reference rendered with the dialect's native identifier-quoting convention. Use this placeholder in every SQL example you author in exploration.md, analysis_*.md, or any custom prompt — the orchestrator substitutes it with the warehouse-specific shape so the LLM sees an example it can copy verbatim. The placeholder name must match [A-Za-z_][A-Za-z0-9_]* (a single SQL-style identifier; no embedded dots, no leading digit). Malformed placeholders pass through unchanged so the typo is visible in the rendered output rather than silently dropped.
Example values (for refDataset = "events_prod"):
- BigQuery / Databricks:
`events_prod`.`sessions` - PostgreSQL / Redshift / Snowflake:
"events_prod"."sessions" - SQL Server (MSSQL):
[events_prod].[sessions]
Usage in prompts:
SELECT user_id FROM {{REF:sessions}}
JOIN {{REF:user_engagement_summary}} USING (user_id)
The agent's internal lookup_schema array continues to take the unquoted dataset.table form (["{{DATASET}}.users", ...]) — that's not warehouse SQL, so it doesn't need dialect-correct quoting.
{{ANALYSIS_AREAS}}
Source: Domain pack analysis areas
Formatted list of all analysis areas the agent should explore.
Example value:
- Churn Risks: Players at risk of leaving the game
- Engagement Patterns: Player behavior and session trends
- Monetization Opportunities: Revenue optimization and conversion opportunities
- Level Difficulty: Difficulty spikes and frustration points
- Booster Usage: Power-up usage patterns and purchase opportunities
{{TOTAL_QUERIES}}
Source: Count of exploration queries relevant to this analysis area
Example value: "6"
{{QUERY_RESULTS}}
Source: Exploration results filtered by area keywords
JSON array of exploration steps relevant to the current analysis area. Each entry includes:
Example value:
[
{
"step": 1,
"timestamp": "2026-03-14T10:30:05Z",
"action": "query_data",
"thinking": "Let me check retention rates by cohort...",
"query": "SELECT cohort_date, day_1_retention FROM retention_cohorts WHERE app_id = '...' ORDER BY cohort_date DESC LIMIT 30",
"query_result": [
{"cohort_date": "2026-03-01", "day_1_retention": 33.2},
{"cohort_date": "2026-02-28", "day_1_retention": 31.8}
],
"row_count": 30,
"execution_time_ms": 450
},
{
"step": 3,
"thinking": "Retention is declining. Let me look at session patterns...",
"query": "SELECT user_id, total_sessions, days_active FROM ...",
"row_count": 100
}
]
{{DISCOVERY_DATE}}
Source: Current date
Example value: "2026-03-14"
{{INSIGHTS_SUMMARY}}
Source: All insights from the analysis phase
Text summary with counts per area.
Example value: "Total: 7 insights (churn: 3, engagement: 2, monetization: 2)"
{{INSIGHTS_DATA}}
Source: All validated insights from the analysis phase
Full JSON array of all insights, including their IDs. The LLM uses insight IDs to populate related_insight_ids on recommendations.
Example value:
[
{
"id": "churn-1",
"analysis_area": "churn",
"name": "Day 0-to-Day 1 Drop: 67% Never Return",
"description": "67% of new players...",
"severity": "critical",
"affected_count": 8298,
"risk_score": 0.67,
"confidence": 0.85,
"metrics": {"churn_rate": 0.67, "avg_sessions_before_churn": 1.2},
"indicators": ["Only 33% return after Day 1", "Avg session: 4.2 minutes"],
"source_steps": [1, 3, 5]
}
]
Variable Substitution Code
Variables are substituted in services/agent/internal/discovery/orchestrator.go using strings.ReplaceAll(). The substitution happens at runtime, just before sending prompts to the LLM.
The two dialect-aware tokens ({{DIALECT}} and {{REF:tablename}}) are applied by the shared substituteDialectTokens helper in services/agent/internal/discovery/prompt_subs.go. Every prompt site (exploration, analysis area, recommendations, base context) calls the helper after the regular ReplaceAll chain so dialect awareness is wired in one place. The helper reads provider.SQLDialect() and provider.QuoteRef(...) on the warehouse provider — adding a new warehouse provider therefore inherits dialect-aware prompt rendering for free.
Next Steps
- Prompts Concept — How prompts are assembled and overridden
- Customizing Prompts — Edit prompts and add custom areas
- Creating Domain Packs — Write prompts for a new domain