Version: Next

Prompt Variables Reference

Version: 0.4.0

Template variables in prompt files use the {{VARIABLE_NAME}} syntax. The agent replaces them with project-specific values at runtime before sending to the LLM.

All Variables

Variable	Used In	Replaced With	Type
`{{PROFILE}}`	`base_context.md`	JSON-encoded project profile	JSON string
`{{PREVIOUS_CONTEXT}}`	`base_context.md`	Previous discoveries + user feedback	Multi-line text
`{{SCHEMA_INFO}}`	`exploration.md`	Level-0 catalog only — one line per table (name, column count, row count, keyword hints). Per-table column lists and sample rows are fetched on demand by the LLM via `lookup_schema` / `search_tables` actions.	Plain text
`{{DATASET}}`	`exploration.md`, `analysis_*.md`	Dataset/schema names	Comma-separated string
`{{FILTER}}`	`exploration.md`	SQL WHERE clause	SQL fragment
`{{FILTER_CONTEXT}}`	`exploration.md`	Human-readable filter description	Text
`{{FILTER_RULE}}`	`exploration.md`	SQL construction rule for the filter	Text
`{{ANALYSIS_AREAS}}`	`exploration.md`	List of analysis areas with descriptions	Multi-line text
`{{DIALECT}}`	`base_context.md`, `exploration.md`, `analysis_*.md`, `recommendations.md`	Warehouse SQL dialect name returned by `provider.SQLDialect()` (e.g. `"BigQuery Standard SQL"`, `"Microsoft SQL Server T-SQL …"`, `"PostgreSQL …"`).	Plain text
`{{REF:tablename}}`	any prompt	Dialect-correct fully-qualified table reference produced by `provider.QuoteRef(refDataset, "tablename")` — e.g. `events_prod`.`sessions` on BigQuery, `[dbo].[sessions]` on SQL Server, `"public"."sessions"` on PostgreSQL. The identifier must match `[A-Za-z_][A-Za-z0-9_]*`; placeholders that don't match (empty name, embedded dot, leading digit) are passed through unchanged so a malformed marker is visible in the rendered prompt.	Plain text
`{{TOTAL_QUERIES}}`	`analysis_*.md`	Count of relevant exploration queries	Integer
`{{QUERY_RESULTS}}`	`analysis_*.md`	Exploration query results for this area	JSON array
`{{DISCOVERY_DATE}}`	`recommendations.md`	Current date (ISO format)	Date string
`{{INSIGHTS_SUMMARY}}`	`recommendations.md`	Text summary of insight counts	Text
`{{INSIGHTS_DATA}}`	`recommendations.md`	Full insight array with IDs	JSON array

Detailed Reference

Source: project.profile from MongoDB

The project profile serialized as JSON. Contains domain-specific fields defined by the domain pack's profile schema.

Example value:

{
  "basic_info": {
    "genre": "puzzle",
    "sub_genre": "match-3",
    "platforms": ["iOS", "Android"],
    "target_audience": "Adult Female 30-65+"
  },
  "gameplay": {
    "core_mechanic": "match3",
    "game_type": "level_based",
    "session_type": "short",
    "avg_session_duration_minutes": 6
  },
  "monetization": {
    "model": "freemium",
    "has_ads": true,
    "has_iap": true
  },
  "boosters": [
    {"name": "Magnet", "usage": "consumable", "starting_amount": 3, "can_purchase": true}
  ],
  "kpis": {
    "retention_d1_target": 40,
    "arpu_target": 0.50
  }
}

If profile is empty: Shows "No project profile configured. Provide general analysis.".

Source: Last 5 discoveries + all feedback for the project

Built by the agent's buildPreviousContext() function. Contains:

Discovery count and date: "This is discovery run #5. Last discovery: 2026-03-12."
Previous insights: Names, areas, severity, dates — with dedup instruction
Disliked insights: With user comments — "AVOID similar conclusions"
Liked insights: "MONITOR for changes"
Previous recommendations: "Don't repeat unless changed"

Example value:

This is discovery run #5. Last discovery: 2026-03-12.

### Previously Found Insights
These insights were already discovered. Do NOT repeat them unless the data has significantly changed.

- **Day 0-to-Day 1 Drop: 67% Never Return** [churn, critical] — 8298 affected (2026-03-11)
- **Level 11 Difficulty Cliff** [churn, high] — 642 affected (2026-03-12)

### User Feedback — Disliked Insights (AVOID)
- **Low engagement pattern** — user comment: "not relevant to our game"

### User Feedback — Liked Insights (MONITOR)
- **Day 0-to-Day 1 Drop: 67% Never Return**

### Previously Given Recommendations
Don't repeat these unless the situation has changed.
- P1: Reduce Level 11-15 Difficulty by 25% (difficulty)

If first run: Empty string (no previous context).

Source: Schema renderer — Level-0 catalog only.

Compact catalog of every table the project indexed in Qdrant. One line per table: name | Nc | Xk/M rows | keyword1, keyword2 | -> joins_to. The catalog is sized to the project's prompt budget; archive-shaped tables (*_2023, *_LOG, *_BKP, *_TMP, …) and the smallest by row count are dropped first when the catalog exceeds the budget.

Example value:

analytics_data.users       | 14c | 50K rows   | users, accounts       | -> sessions
analytics_data.sessions    |  9c | 500K rows  | sessions, activity    | -> users, events
analytics_data.events      | 22c | 4.2M rows  | events, clickstream

This is the only schema content sent up-front. Column lists and sample rows are fetched on demand by the LLM during exploration:

lookup_schema: pulls L1 detail (columns + 3 sample rows) for up to 10 tables per call. Per-run budget: 30 lookup calls.
search_tables: runs a semantic query against the per-project Qdrant index. Per-run budget: 30 search calls. TopK clamped to 30.

Both budgets and the per-call cap are surfaced to the LLM in the initial exploration message; see docs/architecture/agent-on-demand-schema.md for the architectural rationale (the previous "always send L1" model exhausted the Bedrock 1M-token context on long runs).

Source: project.warehouse.datasets array

Comma-separated dataset names (BigQuery) or schema names (Redshift).

Example values:

BigQuery: "analytics_data, features_prod"
Redshift: "public"

Source: project.warehouse.filter_field and filter_value

SQL WHERE clause for multi-tenant data filtering.

Example value: "WHERE app_id = '68a42f378e3b227c8e41b0e5'"

If no filter configured: Empty string.

Source: Same as {{FILTER}}

Human-readable explanation of the filter for the AI.

Example value: "Data is filtered to app_id='68a42f378e3b227c8e41b0e5'. Always include this filter in your queries."

Source: Same as {{FILTER}}

SQL construction rule.

Example value: "Always include: WHERE app_id = '68a42f378e3b227c8e41b0e5' in all queries."

Source: provider.SQLDialect() on the project's connected warehouse provider.

The SQL dialect description rendered into every prompt so the LLM emits dialect-correct SQL on the first generation rather than round-tripping through the SQL-fix LLM call. Mirrored into the insight validator's verification prompt via the **SQL Dialect**: %s line.

Example values:

BigQuery: "BigQuery Standard SQL"
PostgreSQL: "PostgreSQL (ANSI SQL with extensions: DISTINCT ON, CTEs, window functions, JSON operators, array types)"
SQL Server: "Microsoft SQL Server T-SQL (ANSI SQL with extensions: TOP, OFFSET/FETCH, CTEs, window functions, PIVOT/UNPIVOT, MERGE, CROSS APPLY, OUTER APPLY, XML/JSON functions)"

Source: provider.QuoteRef(refDataset, "tablename") on the project's connected warehouse provider, where refDataset is the first dataset in project.warehouse.datasets.

A dialect-correct fully-qualified table reference rendered with the dialect's native identifier-quoting convention. Use this placeholder in every SQL example you author in exploration.md, analysis_*.md, or any custom prompt — the orchestrator substitutes it with the warehouse-specific shape so the LLM sees an example it can copy verbatim. The placeholder name must match [A-Za-z_][A-Za-z0-9_]* (a single SQL-style identifier; no embedded dots, no leading digit). Malformed placeholders pass through unchanged so the typo is visible in the rendered output rather than silently dropped.

Example values (for refDataset = "events_prod"):

BigQuery / Databricks: `events_prod`.`sessions`
PostgreSQL / Redshift / Snowflake: "events_prod"."sessions"
SQL Server (MSSQL): [events_prod].[sessions]

Usage in prompts:

SELECT user_id FROM {{REF:sessions}}
JOIN {{REF:user_engagement_summary}} USING (user_id)

The agent's internal lookup_schema array continues to take the unquoted dataset.table form (["{{DATASET}}.users", ...]) — that's not warehouse SQL, so it doesn't need dialect-correct quoting.

Source: Domain pack analysis areas

Formatted list of all analysis areas the agent should explore.

Example value:

- Churn Risks: Players at risk of leaving the game
- Engagement Patterns: Player behavior and session trends
- Monetization Opportunities: Revenue optimization and conversion opportunities
- Level Difficulty: Difficulty spikes and frustration points
- Booster Usage: Power-up usage patterns and purchase opportunities

Source: Count of exploration queries relevant to this analysis area

Example value: "6"

Source: Exploration results filtered by area keywords

JSON array of exploration steps relevant to the current analysis area. Each entry includes:

Example value:

[
  {
    "step": 1,
    "timestamp": "2026-03-14T10:30:05Z",
    "action": "query_data",
    "thinking": "Let me check retention rates by cohort...",
    "query": "SELECT cohort_date, day_1_retention FROM retention_cohorts WHERE app_id = '...' ORDER BY cohort_date DESC LIMIT 30",
    "query_result": [
      {"cohort_date": "2026-03-01", "day_1_retention": 33.2},
      {"cohort_date": "2026-02-28", "day_1_retention": 31.8}
    ],
    "row_count": 30,
    "execution_time_ms": 450
  },
  {
    "step": 3,
    "thinking": "Retention is declining. Let me look at session patterns...",
    "query": "SELECT user_id, total_sessions, days_active FROM ...",
    "row_count": 100
  }
]

Source: Current date

Example value: "2026-03-14"

Source: All insights from the analysis phase

Text summary with counts per area.

Example value: "Total: 7 insights (churn: 3, engagement: 2, monetization: 2)"

Source: All validated insights from the analysis phase

Full JSON array of all insights, including their IDs. The LLM uses insight IDs to populate related_insight_ids on recommendations.

Example value:

[
  {
    "id": "churn-1",
    "analysis_area": "churn",
    "name": "Day 0-to-Day 1 Drop: 67% Never Return",
    "description": "67% of new players...",
    "severity": "critical",
    "affected_count": 8298,
    "risk_score": 0.67,
    "confidence": 0.85,
    "metrics": {"churn_rate": 0.67, "avg_sessions_before_churn": 1.2},
    "indicators": ["Only 33% return after Day 1", "Avg session: 4.2 minutes"],
    "source_steps": [1, 3, 5]
  }
]

Variable Substitution Code

Variables are substituted in services/agent/internal/discovery/orchestrator.go using strings.ReplaceAll(). The substitution happens at runtime, just before sending prompts to the LLM.

The two dialect-aware tokens ({{DIALECT}} and {{REF:tablename}}) are applied by the shared substituteDialectTokens helper in services/agent/internal/discovery/prompt_subs.go. Every prompt site (exploration, analysis area, recommendations, base context) calls the helper after the regular ReplaceAll chain so dialect awareness is wired in one place. The helper reads provider.SQLDialect() and provider.QuoteRef(...) on the warehouse provider — adding a new warehouse provider therefore inherits dialect-aware prompt rendering for free.

Next Steps

Prompts Concept — How prompts are assembled and overridden
Customizing Prompts — Edit prompts and add custom areas
Creating Domain Packs — Write prompts for a new domain

All Variables​

Detailed Reference​

{{PROFILE}}​

{{PREVIOUS_CONTEXT}}​

{{SCHEMA_INFO}}​

{{DATASET}}​

{{FILTER}}​

{{FILTER_CONTEXT}}​

{{FILTER_RULE}}​

{{DIALECT}}​

{{REF:tablename}}​

{{ANALYSIS_AREAS}}​

{{TOTAL_QUERIES}}​

{{QUERY_RESULTS}}​

{{DISCOVERY_DATE}}​

{{INSIGHTS_SUMMARY}}​

{{INSIGHTS_DATA}}​

Variable Substitution Code​

Next Steps​

All Variables

Detailed Reference

{{PROFILE}}

{{PREVIOUS_CONTEXT}}

{{SCHEMA_INFO}}

{{DATASET}}

{{FILTER}}

{{FILTER_CONTEXT}}

{{FILTER_RULE}}

{{DIALECT}}

{{REF:tablename}}

{{ANALYSIS_AREAS}}

{{TOTAL_QUERIES}}

{{QUERY_RESULTS}}

{{DISCOVERY_DATE}}

{{INSIGHTS_SUMMARY}}

{{INSIGHTS_DATA}}

Variable Substitution Code

Next Steps