mcp analyst agent¶
Purpose¶
Agent tools and workflows for AI-assisted analysis using Dataface context. This workstream builds the MCP server, tool definitions, and prompt workflows that let AI agents (in Cursor, Claude, etc.) interact with Dataface — inspecting schemas, generating dashboards, running queries, and iterating on analysis. The goal is that an analyst can describe what they want in natural language and an AI agent produces a working dashboard or analysis. Adjacent to inspect-profiler (which provides the data context the agent uses) and context-catalog-nimble (which defines how context is structured and surfaced).
Owner¶
- Data AI Engineer Architect
Tasks by Milestone¶
A runnable prototype path exists for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning, with concrete artifacts that prove the flow works end-to-end in the current codebase. Core assumptions are documented, known constraints are explicit, and the team can explain what is real versus mocked without ambiguity.
- Prototype gaps and follow-on capture — Document top gaps and risks in eval and guardrail framework that must be addressed next.
- Prototype implementation path — Implement a runnable end-to-end prototype path for MCP tool execution model.
- Prototype validation and proof — Validate agent prompt/workflow behavior with concrete proof artifacts and repeatable steps.
Internal analysts can execute at least one weekly real workflow that depends on AI agent tool interfaces, execution workflows, and eval-driven behavior tuning in the 5T Analytics environment, without bespoke engineering intervention for every run. Instrumentation and feedback capture are in place so failures, friction points, and adoption gaps are visible and triaged with owners.
- Extract shared chat.js and chat_stream SSE endpoint — Extract the shared chat component chat.js and chat_stream SSE endpoint as a standalone M1 task. This resolves the depen…
- MCP tooling contract for extension + Copilot dashboard/query generation — Define and harden MCP tool inputs/outputs so extension and Copilot can reliably generate dashboards and queries in pilo…
- Unify Cloud AI Tool Dispatch to Use Canonical MCP Tools AI Agent Surfaces — Replace the bespoke _execute_tool_sync() in apps/cloud/apps/ai/views.py (which only supports 4 tools: validate_yaml, te…
- Wire Playground AI to use MCP tools instead of bespoke tool set — The Playground app currently maintains its own bespoke AI tools - validate_yaml, test_yaml_execution, execute_query_res…
- Add JSON render output format — Add format=json to the render pipeline that walks the layout tree, executes queries, resolves charts, and returns the r…
- Refactor Cloud AI chat stream into scoped execution services — Refactor apps/cloud/apps/ai/views.py chat_stream into smaller scope-resolution, tool-execution, and SSE-streaming units…
- Replace AI tool dispatch switch with registry-backed handlers — Refactor dataface/ai/tools.py so canonical tool schemas and handlers are registered in one place instead of maintained…
- Save dashboard MCP tool - persist agent work to project — Add a save_dashboard MCP tool that writes agent-generated YAML to the project file system. Currently all tools are stat…
- Scope playground MCP surface to playground sources — Refactor the shared AI/MCP surface to accept an injected context for adapter registry, dashboard directory, base dir, a…
- Wire Dataface to internal analytics repo and BigQuery source — Set up the Dataface-side access path to the internal analytics warehouse and sibling analytics dbt repo. Use /Users/dav…
- Add resolved YAML render output format — Add a format=yaml output that produces a resolved dataface YAML -- auto chart types filled in, auto-detected fields exp…
- Type terminal agent event protocol and provider stream adapters — Refactor the terminal agent loop introduced in dataface/ai/agent.py and dataface/ai/llm.py to use explicit typed event…
AI agent tool interfaces, execution workflows, and eval-driven behavior tuning is hardened enough for regular use by multiple internal teams and initial design partners, with a predictable response loop for issues and requests. Quality expectations are documented, and prioritized improvements from real usage are actively incorporated into delivery.
- Add 'describe' or 'text' render output format for AI agents — Add a
describe/textrender mode so AI agents can request compact textual dashboard outputs instead of visual payloa… - Adoption hardening for internal teams — Harden MCP tool execution model for repeated use across multiple internal teams and first design partners.
- Build text-to-SQL eval runner and deterministic scorer — Build a Dataface text-to-SQL eval harness that runs agent/model prompts against the cleaned benchmark and scores output…
- Chat-First Home Page - Conversational AI Interface for Dataface Cloud AI Agent Surfaces — Replace the current org home page (dashboard grid) with a chat-first interface. The home screen shows existing dashboar…
- Create cleaned dbt SQL benchmark artifact — Create a reproducible benchmark-prep step that imports the raw dbt dataset from cto-research, filters out AISQL rows, r…
- Design-partner feedback loop operations — Operationalize rapid feedback-to-fix loop for agent prompt/workflow behavior with explicit decision logs.
- Embeddable Dashboards in Chat - Inline Preview, Modal Expand, and Save to Repo AI Agent Surfaces — Dashboards generated during chat conversations can be embedded inline as interactive previews. Users click to expand in…
- Extract shared text-to-SQL generation function Benchmark-Driven Text-to-SQL and Discovery Evals — Extract a shared generate_sql(question, context_provider, model) function, wire render_dashboard and cloud AIService to…
- MCP and skills auto-install across all AI clients — Expand dft mcp init to cover VS Code, Claude Code, and GitHub Copilot Coding Agent. Register MCP server programmaticall…
- Quality standards and guardrails — Define and enforce quality standards for eval and guardrail framework to keep output consistent as contributors expand.
- Run agent eval loop with internal analysts — Establish repeatable agent-level eval workflow that tests the full loop (prompt → tool use → SQL generation → dashboard…
- Set up eval leaderboard dft project and dashboards Benchmark-Driven Text-to-SQL and Discovery Evals — Create a dft project inside the eval output directory with dashboard faces that visualize eval results as a leaderboard…
- Terminal Agent TUI - dft agent — Build a Claude Code-like terminal AI agent as a dft subcommand. The agent comes pre-loaded with Dataface MCP tools and…
- Add catalog discovery evals derived from SQL benchmark — Adapt the dbt SQL benchmark into search/catalog discovery eval cases by extracting expected tables from gold SQL and ge…
- Add persistent analyst memories and learned context AI Quality Experimentation and Context Optimization — Design and implement a memories file that accumulates knowledge from analyst queries — table quirks, column semantics,…
- Chat Conversation Persistence and History AI Agent Surfaces — Add ChatSession and ChatMessage Django models so chat conversations survive page refreshes. Show recent conversations i…
- Curate schema and table scope for eval benchmark AI Quality Experimentation and Context Optimization — Decide which schemas, tables, and data layers (raw, silver/staging, gold/marts) to include in the eval scope and catalo…
- Persist eval outputs for Dataface analysis and boards — Define the canonical eval artifact schema for run metadata, per-case results, retrieval results, and summaries. Add loa…
- Run context and model ablation experiments AI Quality Experimentation and Context Optimization — Define and execute the initial experiment matrix using the eval system. Compare models (GPT-4o, GPT-5, Claude Sonnet, e…
Launch scope for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning is complete, externally explainable, and supportable: user-facing behavior is stable, documentation is publishable, and operational ownership is explicit. Remaining gaps are non-blocking, risk-assessed, and tracked as post-launch follow-up rather than unresolved launch debt.
- Launch docs and external readiness — Publish external-facing documentation and examples for agent prompt/workflow behavior that are executable by new users.
- Launch operations and reliability readiness — Finalize operational readiness for eval and guardrail framework: telemetry, alerting, support ownership, and incident p…
- Public launch scope completion — Complete launch-critical scope for MCP tool execution model with production-safe behavior and rollback clarity.
- Desktop app - lightweight wrapper around Dataface Cloud web UI — Build a desktop application that wraps the Dataface Cloud web interface. Provides native OS integration like menu bar,…
- Patch-based AI edits for dashboard YAML — Instead of AI regenerating entire YAML files when refining dashboards, support targeted YAML patches inspired by json-r…
- Schema-derived AI prompts from compiled types — Auto-generate LLM system prompts from the Dataface schema definition rather than hand-maintaining prompt templates. Ins…
- Skill and tool quality evaluation framework — Build a framework to A/B test whether individual MCP skills improve agent output quality vs raw tool access. Measure sk…
Post-launch stabilization is complete for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning: recurring incidents are reduced, support burden is lower, and quality gates are enforced consistently before release. The team has a repeatable operating model for maintenance, regression prevention, and measured reliability improvements.
- Regression prevention and quality gates — Add or enforce regression gates around agent prompt/workflow behavior so release quality is sustained automatically.
- Sustainable operating model — Document and adopt sustainable operating model for eval and guardrail framework across support, triage, and release cad…
- v1.0 stability and defect burn-down — Run stability program for MCP tool execution model with recurring defect burn-down and reliability trend tracking.
v1.2 delivers meaningful depth improvements in AI agent tool interfaces, execution workflows, and eval-driven behavior tuning based on observed usage and retention signals, not just roadmap intent. Enhancements improve real customer outcomes, and release readiness is demonstrated through metrics, regression coverage, and clear migration guidance where relevant.
- Add eval loop for dashboard search and variable-scoped navigation — Build a repeatable eval loop for dashboard search that measures not only whether the right dashboard is retrieved, but…
- Expand dashboard search to return variable-scoped deep links — Extend dashboard search and handoff flows so agents and users can navigate to an existing dashboard with explicit varia…
- Quality and performance improvements — Ship measurable quality/performance improvements in agent prompt/workflow behavior tied to user-facing outcomes.
- v1.2 depth expansion — Deliver depth expansion in MCP tool execution model prioritized by observed usage and retention outcomes.
- v1.2 release and migration readiness — Prepare v1.2 release/migration readiness for eval and guardrail framework, including communication and upgrade guidance.
Long-horizon opportunities for AI agent tool interfaces, execution workflows, and eval-driven behavior tuning are captured as concrete hypotheses with user impact, prerequisites, and evaluation criteria. Ideas are ranked by strategic value and feasibility so future investment decisions can be made quickly with less rediscovery.
- Experiment design for future bets — Design validation experiments for eval and guardrail framework so future bets can be tested before major investment.
- Future opportunity research — Capture long-horizon opportunities for MCP tool execution model with user impact and strategic fit.
- Prerequisite and dependency mapping — Map enabling prerequisites and dependencies for agent prompt/workflow behavior to reduce future startup cost.
- Streaming YAML generation with early query execution — When AI generates a dashboard, stream the YAML and begin executing queries as they arrive rather than waiting for the f…