Turn every discovery
into training data.
DecisionBox records everything the agent does — SQL queries, schema lookups, validation retries, errors and their fixes, every finding, every feedback thumb you've given. Export it, fine-tune an open model on it, and the next version of the agent knows your warehouse like a senior analyst does.
Everything the agent does is written down
Not just the findings that ship — the full trace. Eight distinct record types, all structured, all exportable.
Discovery runs
Every step of every run — explore, query, analyze, decide — with the agent's reasoning attached.
Insights
Structured findings with severity, indicators, affected counts, and validation state.
Recommendations
Priority, expected impact, target segment, and the numbered action steps behind each one.
Schema exploration
Which tables and columns the agent inspected, in what order, and why it chose them.
SQL queries
Every query executed, with the results it returned and how they were used downstream.
SQL fix history
Errors paired with their warehouse-specific rewrites — a rich corpus for teaching SQL generation.
Debug logs
Agent-internal reasoning per step: hypotheses formed, rejected, refined.
User feedback
Thumbs-up and thumbs-down on insights and recommendations — natural preference data.
Not just logs — curated training data
Typical fine-tuning datasets are scraped or synthetic. Yours is neither. Every record DecisionBox exports has been validated against your warehouse, feedback-rated by your team, or produced by an agent run that completed successfully.
Every insight's numbers are re-queried against your warehouse before the insight ships. The training set is confirmed ground truth, not model output.
Thumbs-up and thumbs-down on every insight and recommendation. Your team's preferences are attached — DPO-ready out of the box.
Every SQL query ran against your warehouse. Every fix was confirmed to work. Your model learns what actually executes, not what looks right.
Domain pack, schema, and warehouse type are attached to every record. The model learns your data, not just abstractions.
Training on this is training on confirmed ground truth. Not noise, not hallucinated chatter, not unverified model output.
Your team's thumbs-ups go two places
Into the next discovery
Feedback is part of the agent's context on the next run. It learns which patterns your team cares about, which findings you've already shipped, and which cuts you don't find useful.
Into your training set
Every thumbs-up is a labeled positive example; every thumbs-down, a counter-example. This is natural preference data — exactly what modern fine-tuning methods (DPO, ORPO, KTO) are built to consume.
Works with the open model you already run
Export is framework-agnostic JSON/Parquet. Any model family that accepts instruction-tuning data works — here's the short list we've validated against.
Anything on HuggingFace that accepts a standard SFT or DPO dataset will work — these are just the families we've seen customers choose.
Two ways to actually fine-tune
Pick the workflow that matches how your team operates.
Train it yourself
Export the dataset as JSON or Parquet. Run your preferred stack — TRL, Unsloth, Axolotl, with LoRA or QLoRA for parameter-efficient tuning — on your own compute. Best if you already have ML infra and a flow your team likes.
- JSON / Parquet export
- SFT + DPO dataset formats
- Your GPUs, your timeline
Train it from the UI you already use
Pick a base model, click train. Dataset assembly, LoRA / QLoRAorchestration, evaluation against your held-out feedback, and model versioning — all automated. You still bring the compute; we handle the pipeline.
- One-click dataset build
- LoRA / QLoRA out of the box
- Eval against held-out feedback
- Versioned weights, swappable at runtime
The model that knows your warehouse beats the one that knows the internet
SQL tuned to your schema
Generic models guess your columns. A model fine-tuned on your run history already knows what fact_orders looks like and which joins matter.
Run it on your hardware
A 7B–14B fine-tuned model can match a frontier API for your specific task. Lower latency, lower cost, fewer surprises in the bill.
Training data stays with you
The dataset exports into your infrastructure. The fine-tuning job runs on your compute. The resulting weights are yours.
Self-healing SQL, learned
Every error-fix pair the agent logged becomes training signal. Your model learns your warehouse's quirks without being told.
What we do and don't handle.
We capture the data, structure it, and export it. We don't supply compute — you bring your own GPU or cloud. We don't host the fine-tuned model — it's yours to deploy wherever your agent runs (locally via Ollama, vLLM, or a managed endpoint). Until the training tool ships, you'll convert the export to your framework's format — a short adapter script for most stacks.