You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A tiered roadmap built from a survey of the 19 open issues on this repo, the
22 sections of the current SDK.md user manual, the 38 Python modules in src/bigquery_agent_analytics/, and the ADK plugin
(google/adk/plugins/bigquery_agent_analytics_plugin.py, ~3,500 LOC, 14
lifecycle callback hooks, BigQuery Storage Write API path with GCS offload).
Filed for discussion. Estimates are calibrated to a single experienced
SDK / plugin engineer; multiply for parallel streams. Impact and effort
are best-effort; the implementer of each item will refine them.
Ground rules for this doc. Impact = H/M/L on a "downstream user
adoption + DevX uplift" axis, not revenue. Effort = engineer-weeks
(full-time equivalent), confidence band in parens. Items marked strategic decision pending need a maintainer call before
sequencing.
Director summary
For an exec audience, the actionable shape is:
Next 30 days (1 engineer): P0 (~2 eng-weeks) plus Quality Scorecard
Phase 1 (~2 eng-weeks). That's the realistic month. Don't promise
more on a single track.
Next quarter (1 engineer): P0 + most/all of P1 (~15 eng-weeks
total). No P2 unless P1 scope is cut. If a strategic bet
matters more than completing P1, the realistic trade is to drop
one P1 item (typically inheritance compilation) and pull
ReasoningBank up — but pick one, not both.
Next quarter (2 engineers in parallel): ship all P0/P1 work and
fund ReasoningBank as the first strategic P2 bet — it converts
the existing agent_improvement_cycle demo into a real product
surface and has lower execution risk than ontology consolidation.
Defer bigquery_ontology migration (Migrate SDK ontology pipeline to bigquery_ontology package #38) until its ADR is approved;
treat it as the next quarter's strategic bet, not this one.
Always P3: auto-skills, advanced resolvers, Spanner backends,
V5 context graph. Research arm or partner-team work; don't block
production roadmap on them.
Reality check on calendar math: P0 ≈ 2 eng-weeks, P1 ≈ 13 eng-weeks, P0+P1 ≈ 15 eng-weeks ≈ 1 quarter for 1 engineer, not 1 month.
Earlier drafts of the resource table had this wrong; the table below
is corrected.
TL;DR
Three workstreams already mature: trace reconstruction, deterministic
One workstream is high-leverage but quietly under-invested: plugin
telemetry. The cache-hit-rate metric (Improve token efficiency in ADK via BigQuery Agent Analytics #32) is a one-week win that
unlocks the cost-optimization narrative and ties directly into the
existing CodeEvaluator surface.
One workstream needs a strategic decision before any code: the
ontology pipeline migration to upstream bigquery_ontology (Migrate SDK ontology pipeline to bigquery_ontology package #38) —
runtime contract migration, not a module swap; needs a go/no-go.
Method
How I built this:
Read each open issue's body + first 2-3 comments. Categorized by
workstream.
Surveyed src/bigquery_agent_analytics/ (38 modules) and SDK.md
(22 sections, ~1700 lines) to understand what already ships.
Read the ADK plugin to understand the upstream telemetry contract
the SDK consumes.
Tiered each item by impact × effort, then sequenced for parallel
workstreams.
Flagged items that need a strategic call before scoping (marked strategic decision pending).
Upstream google/adk-python@main already extracts cached_content_token_count from usage_metadata and exposes usage_cached_tokens in plugin-generated views (verified at bigquery_agent_analytics_plugin.py:1829). Remaining SDK work: add CodeEvaluator.context_cache_hit_rate(min_rate), document the minimum required ADK plugin version, fall back gracefully when the column is missing on older agent_events tables, write tests. Strong tie-in to post #2's cost narrative.
Posts #1, #2, and #3 are all live on the Google Cloud Medium publication. #82 closed; #51 updated through slot 3. Remaining: confirm #53 / #77 final-state checkbox sweep.
P0 total: ~2 eng-weeks. (Item-level estimates round up to ~2.25 because the cache-hit-rate evaluator is still a small SDK polish even with the plugin work done.)
P1 — ship in 1 quarter (high leverage, 1 eng) or 1 month (2 eng parallel)
Already in flight — Gayathri uploaded evaluation_rubrics.py for review. Locks in three pillars (response_usefulness, task_grounding, policy_compliance) using existing categorical vocabulary. Additive, reuses existing dashboard views.
Quality Scorecard Phase 2: persist root_agent_name + region on categorical_results + new categorical_fleet_leaderboard view
The genuinely net-new piece. Decide upfront: idempotent MERGE vs append-only? resolved_at lifecycle column? Worth a small design note before implementation.
Inheritance (extends) compilation in ontology DDL compiler
Three candidate strategies named in the issue (fan-out / union view / label-referenced edges). Decide one, implement, ship as gm compile --emit-extends-as=… flag.
evaluate --suggest-thresholds baseline helper (deferred from blog #2 polish)
(no issue yet — file one)
M
1 wk (high)
Reads last N days of prod, prints suggested per-metric thresholds with a buffer. Halves the prose burden of blog #2's "how do I pick thresholds" sidebar.
2 wk (med) for the RFC + Phase 1 schema; implementation rolls into Scorecard Phase 2
Promoted from #10's wishlist because Scorecard / ReasoningBank / fleet trends / SxS all assume a unified persistence shape. Today only categorical results have a stable persisted schema (categorical_results); LLM-judge and trajectory-match reports are returned in-memory. RFC needs: shared columns (session_id, metric_name, score, passed, prompt_version, endpoint, execution_mode, created_at), evaluator-type discriminator, view conventions, migration plan from categorical_results (don't fork). Without this, Scorecard Phase 2's "fleet leaderboard" only works for categorical metrics.
P1 total: ~13 eng-weeks (P0+P1 ≈ 15 eng-weeks total). For a single engineer, that's a full quarter, not a month — see resource table below.
P2 — ship in 1 quarter (strategic, larger effort)
Item
Issue
Impact
Effort
Notes
ReasoningBank: per-user/per-session memory of past distilled outcomes, loaded as initial agent context
Storage layer (BQ table for memories), distillation pipeline (LLM-as-Judge + summarization), retrieval API (MemoryService.load_relevant_memories(...)), agent integration shape (callable from plugin or app). Needs a small design RFC first because memory shape affects every downstream consumer.
Compile-time code generation for structured extractors (Phase 1) — only extract_bka_decision_event and the structured-event registry
Gated on #76 landing. Phase 1 scope is deliberate: known structured event schemas only, no free-text. Server-side AI.GENERATE stays as semantic fallback until precision/recall is measured.
Ontology pipeline migration to bigquery_ontology upstream package(strategic decision pending)
Runtime contract migration, not a module swap. Maintainer needs to decide: full migration vs. keep SDK pipeline as a thin wrapper. Risk is high (consumed across 5+ modules); upside is dropping ~5K LOC of duplicate code.
Design proposal phase; needs feedback round resolved. Follows #38 because it should land in bigquery_ontology not in this repo if migration goes ahead.
Design proposal currently. Quoted user feedback: ~85% of brief-validation value sits at runtime, not schema time. Big payoff for production-agentic users; design surface needs to land first.
Auto-benchmark from traces — extract high-signal success/failure pairs to seed eval suites
Builds on existing quality_report.py + agent_improvement_cycle. Generalize the cycle into a reusable extractor. Cross-links to Vertex AI Prompt Optimizer integration (post #4 in the blog series).
Streaming evaluation — Pub/Sub + continuous query path that scores sessions as events arrive
Partial scaffolding exists at _streaming_evaluation.py. Productization needs an architectural call: on-arrival vs. micro-batch, latency budgets, schema for "in-flight" partial sessions.
P2 total: ~23 eng-weeks. With two engineers parallel-streaming, ~12 weeks (one quarter).
P3 — research / future (defer until P0-P2 ships)
Item
Issue
Impact
Effort
Notes
Auto-skills loop based on AutoSkill paper (arxiv 2603.01145) — agents learn reusable skills from interaction history
Alternative storage to BigQuery for ontology data. Significant — multiple concurrent backends multiply maintenance. Defer unless an ADK-team partner needs it.
Cadence: ~one per 2-3 weeks. Critical for SDK adoption. Each post drafted by SDK lead + reviewed against live demo.
Items to deprecate / explicitly close
Item
Status
_LEGACY_LLM_JUDGE_BATCH_QUERY (ML.GENERATE_TEXT path inside the LLM-judge cascade)
Audit usage and define deprecation criteria. AI.GENERATE works without a connection now (per post #3 finding), but ML.GENERATE_TEXT is also intentionally referenced in _bqml_judge (client.py:1141), insights extraction (client.py:2080), and feedback paths, plus there's an active "legacy model reference" warning at client.py:602 that supports customers still passing fully-qualified BQML model refs. Don't mark deprecated until: (a) telemetry on the cascade reports how often the BQML tier actually fires, (b) documented migration path for BQML-model customers exists. Track as a follow-up issue, not a "next release" removal.
--strict for API-fallback judge errors
Already documented as a no-op in this case. Consider auto-disabling and warning rather than silently no-op'ing.
Combined --spec-path flag (gm compile --spec-path ...)
Already deprecated in favor of --ontology PATH --binding PATH. Schedule removal for 0.4.x.
Three decisions gate P2 work. Each is small to write but blocks larger implementation; sign-off needed before the matching P2 item enters a sprint.
#
Decision
Owner
Required artifact
Blocks
1
Ontology pipeline migration to upstream bigquery_ontology (#38)
Ontology workstream lead
ADR (Architecture Decision Record) — short doc, options table (full migration vs. thin wrapper vs. status quo), recommendation, expected risk surface, migration steps. ~1 day to draft.
API boundary note — single page naming what lives in this repo vs. ADK plugin vs. a new package, with the proposed EntityResolver protocol pinned. ~half day to draft.
#58 implementation, #93 Gap 1 (live agent resolution)
Storage schema RFC — BQ table shape (memory rows, distillation provenance, retrieval index), API surface (MemoryService.load_relevant_memories(...)), interaction with the existing agent_improvement_cycle demo so the two don't fork. ~1 day to draft.
Recommended sequence: write all three artifacts in parallel over the same week (~3 person-days total), review together, sign off in one batch. Decision-fatigue on the maintainer side is lower if the three land as one review pass rather than three separate ones.
Resource budget summary
Calibrated against P0 ≈ 2 eng-weeks, P1 ≈ 13 eng-weeks (P0+P1 ≈ 15 eng-weeks total), P2 ≈ 23 eng-weeks. Hard constraint: at most one ontology bet per quarter because the workstream has 8 open issues and shared design surface — concurrent ontology work invites churn.
Headcount × time
What ships
1 eng × 1 month (~4 eng-weeks)
All of P0 + Quality Scorecard Phase 1. That's it. Don't promise Scorecard Phase 2-3 in the same month with one owner.
1 eng × 1 quarter (~12 eng-weeks)
P0 + most of P1 (Scorecard Phases 1-3, baseline helper, extractor prerequisite, persistence RFC). Inheritance compilation slips.
2 eng × 1 quarter parallel
P0 + all of P1 + one strategic P2 bet (ReasoningBank or bigquery_ontology migration). Recommend ReasoningBank because it has the highest "demo-becomes-product" leverage.
2 eng × 2 quarters parallel
Everything in P2 except deep-research items. Sequence ontology migration in Q2 once its ADR is signed.
3 eng × 2 quarters
Everything in P0+P1+P2. P3 research items still deferred.
3 eng × 1 year
Includes P3 research items (auto-skills, V5 context graph, Spanner/MAKO) one at a time.
Sequencing rationale (short version)
P0 first because it's small, asked-for, and unblocks the cost-narrative thread of post Revamp README, enhance documentation navigation, and fix CI #2 readers (cache hit rate is exactly the "tune your token budget" follow-up). Reframed against the current ADK plugin state — the plugin already does the telemetry; SDK adds the evaluator + version pin + fallback.
Quality Scorecard before everything else in P1 because it's already in flight (Gayathri's PR coming) and it converts the existing categorical evaluator surface from "I have to design my own metrics" → "here's a known-good rubric." Adoption boost.
Persistence RFC promoted into P1 (was P3 wishlist as BigQuery Agent Analytics SDK Feature Wish List #10 item 6) because Scorecard Phase 2's fleet leaderboard, ReasoningBank's memory-load lookups, future SxS analysis, and trend tracking all assume a shared evaluation_results shape. Decide the shape once, reuse it everywhere — otherwise each consumer forks its own table.
Ontology consolidation in P2 not P1 because it's the most expensive single decision and we don't want to block the scorecard / extractor work behind it. The ADR for Migrate SDK ontology pipeline to bigquery_ontology package #38 can run in parallel with P1 implementation.
ReasoningBank in P2 not P3 because it ties the agent_improvement_cycle demo to a real product surface. Without ReasoningBank, the demo is "look, agents can self-improve in this contained example"; with ReasoningBank, it's "your agent's memory, in a queryable BQ table."
At most one ontology bet per quarter is a hard constraint, not a preference: 8 open issues share design surface and concurrent work invites churn.
Auto-skills, V5 context graph, Spanner backends to P3 because they're large bets that need a research arm or a partner team to justify the investment. Don't block production work on them.
No commitment to specific calendar dates. Sequencing is relative; absolute dates depend on engineer count + DevRel review cycles + GA gates.
No mention of internal-only Google Cloud product integrations (Vertex AI Agent Engine, etc.) beyond what the public surface already covers. Those would be a parallel internal roadmap.
How to use this issue
Maintainer: leave reactions on items you agree with the priority of; comment with re-rankings on items you disagree with; resolve the three strategic decisions above so P2 can sequence.
Contributors: pick a P0 or P1 item that matches your interest, drop a "I can take this" comment, and the maintainer can hand the linked issue over.
Quarterly review: this roadmap should be re-checked every ~6 weeks. The shape of evaluation work (scorecard, auto-benchmarking) is moving fastest right now and may push items between tiers.
Generated 2026-04-28 from a survey of the 19 open issues, the SDK surface
at target/main, and the ADK plugin at google/adk/plugins/bigquery_agent_analytics_plugin.py.
Open to revision; this is a starting point, not a contract.
BigQuery Agent Analytics — Roadmap
A tiered roadmap built from a survey of the 19 open issues on this repo, the
22 sections of the current
SDK.mduser manual, the 38 Python modules insrc/bigquery_agent_analytics/, and the ADK plugin(
google/adk/plugins/bigquery_agent_analytics_plugin.py, ~3,500 LOC, 14lifecycle callback hooks, BigQuery Storage Write API path with GCS offload).
Filed for discussion. Estimates are calibrated to a single experienced
SDK / plugin engineer; multiply for parallel streams. Impact and effort
are best-effort; the implementer of each item will refine them.
Director summary
For an exec audience, the actionable shape is:
Phase 1 (~2 eng-weeks). That's the realistic month. Don't promise
more on a single track.
total). No P2 unless P1 scope is cut. If a strategic bet
matters more than completing P1, the realistic trade is to drop
one P1 item (typically inheritance compilation) and pull
ReasoningBank up — but pick one, not both.
fund ReasoningBank as the first strategic P2 bet — it converts
the existing
agent_improvement_cycledemo into a real productsurface and has lower execution risk than ontology consolidation.
Defer
bigquery_ontologymigration (Migrate SDK ontology pipeline to bigquery_ontology package #38) until its ADR is approved;treat it as the next quarter's strategic bet, not this one.
V5 context graph. Research arm or partner-team work; don't block
production roadmap on them.
Reality check on calendar math: P0 ≈ 2 eng-weeks, P1 ≈ 13 eng-weeks,
P0+P1 ≈ 15 eng-weeks ≈ 1 quarter for 1 engineer, not 1 month.
Earlier drafts of the resource table had this wrong; the table below
is corrected.
TL;DR
keep them stable.
loop — quality scorecard (Proposal: Automated Agent Quality Scorecard #63), automated benchmarking (Data Science Features for BQ Agent Analytics #95), agent
improvement cycle (already shipped as a demo), ReasoningBank (BigQuery Agent Analytics SDK — ReasoningBank #49). This
is the user-facing differentiator over the next quarter.
telemetry. The cache-hit-rate metric (Improve token efficiency in ADK via BigQuery Agent Analytics #32) is a one-week win that
unlocks the cost-optimization narrative and ties directly into the
existing CodeEvaluator surface.
issues across V5 Context Graph: TTL import, mixed extraction, temporal lineage — implementation design #12, feat: implement inheritance (extends) compilation support #30, Migrate SDK ontology pipeline to bigquery_ontology package #38, Feat: SKOS import support alongside OWL (design proposal — feedback wanted) #57, Feat: Runtime entity resolution primitives — OntologyRuntime, concept index, EntityResolver protocol (design proposal — feedback wanted) #58, Epic: Compile-time code generation for structured trace extractors (scoped rework) #75, Feat: Ontology-aware validate_extracted_graph with fallback-scope classification (prerequisite for #75) #76, Epic: Address Remaining Ontology Platform Gaps (Live Agent Resolution, Advanced Resolvers, SHACL, Spanner/MAKO) #93). Multiple design
proposals are gating implementation. Constraint: at most one ontology
bet per quarter (see resource table below).
ontology pipeline migration to upstream
bigquery_ontology(Migrate SDK ontology pipeline to bigquery_ontology package #38) —runtime contract migration, not a module swap; needs a go/no-go.
Method
How I built this:
workstream.
src/bigquery_agent_analytics/(38 modules) andSDK.md(22 sections, ~1700 lines) to understand what already ships.
the SDK consumes.
workstreams.
strategic decision pending).
Workstream map (where the open issues live)
agent_events; GCS offload; PyArrow schemaClient.get_session_trace,list_traces, tree renderagent_improvement_cycleexample,quality_report.pyclient.insights(), drift detection,memory_servicegmCLI, OWL importer, DDL compiler, materializer,compile_concept_indexbq-agent-sdkCLI,SDK.md,examples/, blog seriesP0 — ship in ≤2 weeks (low-risk, asked-for)
feedback="..."snippet inevaluate --exit-codeFAIL output"and\. Surfaced from blog #3 live capture.google/adk-python@mainalready extractscached_content_token_countfromusage_metadataand exposesusage_cached_tokensin plugin-generated views (verified atbigquery_agent_analytics_plugin.py:1829). Remaining SDK work: addCodeEvaluator.context_cache_hit_rate(min_rate), document the minimum required ADK plugin version, fall back gracefully when the column is missing on olderagent_eventstables, write tests. Strong tie-in to post #2's cost narrative.P0 total: ~2 eng-weeks. (Item-level estimates round up to ~2.25 because the cache-hit-rate evaluator is still a small SDK polish even with the plugin work done.)
P1 — ship in 1 quarter (high leverage, 1 eng) or 1 month (2 eng parallel)
evaluation_rubrics.py) over existingCategoricalEvaluatorevaluation_rubrics.pyfor review. Locks in three pillars (response_usefulness,task_grounding,policy_compliance) using existing categorical vocabulary. Additive, reuses existing dashboard views.root_agent_name+regiononcategorical_results+ newcategorical_fleet_leaderboardviewALTER TABLEmigration plan. Bridges the eval results → fleet ranking gap that the SDK doesn't have today.Client.triage_low_score_sessions(...)+hitl_triage_queuetableMERGEvs append-only?resolved_atlifecycle column? Worth a small design note before implementation.extends) compilation in ontology DDL compilergm compile --emit-extends-as=…flag.validate_extracted_graph(spec, graph)evaluate --suggest-thresholdsbaseline helper (deferred from blog #2 polish)evaluation_resultspersistence layer (design RFC + Phase 1 schema)categorical_results); LLM-judge and trajectory-match reports are returned in-memory. RFC needs: shared columns (session_id,metric_name,score,passed,prompt_version,endpoint,execution_mode,created_at), evaluator-type discriminator, view conventions, migration plan fromcategorical_results(don't fork). Without this, Scorecard Phase 2's "fleet leaderboard" only works for categorical metrics.P1 total: ~13 eng-weeks (P0+P1 ≈ 15 eng-weeks total). For a single engineer, that's a full quarter, not a month — see resource table below.
P2 — ship in 1 quarter (strategic, larger effort)
MemoryService.load_relevant_memories(...)), agent integration shape (callable from plugin or app). Needs a small design RFC first because memory shape affects every downstream consumer.extract_bka_decision_eventand the structured-event registryAI.GENERATEstays as semantic fallback until precision/recall is measured.bigquery_ontologyupstream package (strategic decision pending)bigquery_ontologynot in this repo if migration goes ahead.OntologyRuntime, concept index lookups,EntityResolverprotocolquality_report.py+agent_improvement_cycle. Generalize the cycle into a reusable extractor. Cross-links to Vertex AI Prompt Optimizer integration (post #4 in the blog series)._streaming_evaluation.py. Productization needs an architectural call: on-arrival vs. micro-batch, latency budgets, schema for "in-flight" partial sessions.P2 total: ~23 eng-weeks. With two engineers parallel-streaming, ~12 weeks (one quarter).
P3 — research / future (defer until P0-P2 ships)
EntityResolverfor live agents (extends #58)EntityResolverprotocol. Embedding store + cosine-similarity layer + LLM-disambiguation tier.validate_extracted_graphfrom #76. Niche but needed for governance-heavy verticals.bigquery_ontologymigration decision (#38).Items to deprecate / explicitly close
_LEGACY_LLM_JUDGE_BATCH_QUERY(ML.GENERATE_TEXTpath inside the LLM-judge cascade)ML.GENERATE_TEXTis also intentionally referenced in_bqml_judge(client.py:1141), insights extraction (client.py:2080), and feedback paths, plus there's an active "legacy model reference" warning atclient.py:602that supports customers still passing fully-qualified BQML model refs. Don't mark deprecated until: (a) telemetry on the cascade reports how often the BQML tier actually fires, (b) documented migration path for BQML-model customers exists. Track as a follow-up issue, not a "next release" removal.--strictfor API-fallback judge errors--spec-pathflag (gm compile --spec-path ...)--ontology PATH --binding PATH. Schedule removal for 0.4.x.Strategic decisions pending (need maintainer call)
Three decisions gate P2 work. Each is small to write but blocks larger implementation; sign-off needed before the matching P2 item enters a sprint.
bigquery_ontology(#38)EntityResolverprotocol pinned. ~half day to draft.MemoryService.load_relevant_memories(...)), interaction with the existingagent_improvement_cycledemo so the two don't fork. ~1 day to draft.agent_improvement_cycleproductizationRecommended sequence: write all three artifacts in parallel over the same week (~3 person-days total), review together, sign off in one batch. Decision-fatigue on the maintainer side is lower if the three land as one review pass rather than three separate ones.
Resource budget summary
Calibrated against P0 ≈ 2 eng-weeks, P1 ≈ 13 eng-weeks (P0+P1 ≈ 15 eng-weeks total), P2 ≈ 23 eng-weeks. Hard constraint: at most one ontology bet per quarter because the workstream has 8 open issues and shared design surface — concurrent ontology work invites churn.
bigquery_ontologymigration). Recommend ReasoningBank because it has the highest "demo-becomes-product" leverage.Sequencing rationale (short version)
What this roadmap deliberately doesn't do
bigquery_ontology↔BigQuery-Agent-Analytics-SDKrepo splits beyond the existing Migrate SDK ontology pipeline to bigquery_ontology package #38 decision. That's an organizational call.How to use this issue
Generated 2026-04-28 from a survey of the 19 open issues, the SDK surface
at
target/main, and the ADK plugin atgoogle/adk/plugins/bigquery_agent_analytics_plugin.py.Open to revision; this is a starting point, not a contract.