fix(telemetry): surface FallbackAdapter active model/provider on parent spans#1341
Open
mrniket wants to merge 2 commits into
Open
fix(telemetry): surface FallbackAdapter active model/provider on parent spans#1341mrniket wants to merge 2 commits into
mrniket wants to merge 2 commits into
Conversation
|
9abe920 to
3d3cbce
Compare
…nt spans
When `llm.FallbackAdapter`, `tts.FallbackAdapter`, or `stt.FallbackAdapter`
wraps multiple providers, parent spans (`llm_node`, `tts_node`, `user_turn`)
were stamped with the wrapper labels (`FallbackAdapter`, `inference.STT`,
`livekit`) instead of the provider that actually handled the request. This
made `gen_ai.request.model` / `gen_ai.provider.name` useless for telemetry
consumers when fallbacks were in play.
Changes:
- llm/fallback_adapter: capture caller span on construction; on first
successful chunk, write `gen_ai.request.model` / `gen_ai.provider.name`
back onto the caller span, the inner `llm_request` span, and the run
span. Wrap each attempt in an `llm_fallback_attempt` span carrying the
attempt index plus model/provider.
- tts/fallback_adapter: same propagation pattern via captured caller span.
- stt/fallback_adapter:
- track `_activeStt` set when a child stream produces events or
`recognize()` succeeds; expose it via `label` / `model` / `provider`
getters so callers reading the wrapper see the active child.
- wrap each `recognize()` and stream attempt in
`stt_fallback_recognize_attempt` / `stt_fallback_stream_attempt`
spans with attempt index + model/provider.
- voice/agent_activity + audio_recognition: thread the `STT` instance
into AudioRecognition so `user_turn` re-reads the active model/provider
on each STT event. Skip `setAttribute` when nothing changed.
Cost attribution:
- voice/generation: capture final `ChatChunk.usage` and stamp exact
prompt/completion tokens on the `llm_node` span, classified as
`generation`. This becomes the single billable layer for an LLM turn,
so tracing backends that infer cost from observation type don't fall
back to a local-tokenizer estimate of the prompt text.
- llm/llm + tts/tts: classify `llm_request` / `tts_request` spans as
`span` (not `generation`) so wrapper + provider layers aren't double-
counted as separate cost centres. Made `_llmRequestSpan` /
`_ttsRequestSpan` `protected` so subclass implementations can write
through to them.
- LiveKit Cloud is unaffected: `gen_ai.usage.*` is still emitted on the
inner `llm_request` / `tts_request` spans for backends that read it
directly.
- telemetry/trace_types: add a new observation-type attribute (matches
the existing naming convention in this file) plus
`ATTR_FALLBACK_ATTEMPT_INDEX`.
Verified end-to-end against a real call — `llm_node` model now reads the
active provider model (was `FallbackAdapter`), `user_turn` model reads
the active STT (was `inference.STT`), per-turn cost matches exact
provider math (was ~3x over).
Brings in 31 upstream commits since the branch diverged. Two real conflicts: - agents/src/llm/llm.ts: upstream added `#providerRequestIds`; this branch made `_llmRequestSpan` protected (so FallbackLLMStream can write through). Kept both — protected `_llmRequestSpan` plus private `#providerRequestIds`. - agents/src/voice/audio_recognition.ts: upstream added requestId collection in `onSTTEvent`; this branch added `refreshUserTurnSttAttributes()` at the same spot for FallbackAdapter live-update. Kept both, refresh first. Other files (tts.ts, generation.ts, agent_activity.ts, trace_types.ts) auto- merged cleanly — upstream's `#providerRequestIds` field on tts.ts coexists with this branch's protected `_ttsRequestSpan` the same way as llm.ts. # Conflicts: # agents/src/llm/llm.ts # agents/src/voice/audio_recognition.ts
3d3cbce to
94e9807
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
llm.FallbackAdapter/tts.FallbackAdapter/stt.FallbackAdaptercurrently leave their wrapper labels (FallbackAdapter,inference.STT,livekit) on the parent spans (llm_node,tts_node,user_turn), so when a fallback fires the trace can't tell you which provider actually handled the turn —gen_ai.request.model/gen_ai.provider.nameare stuck on the wrapper.Separately,
llm_nodeand the innerllm_requestspans both carriedgen_ai.usage.*and were both shaped like generation spans, so any tracing backend that infers cost from observation type ended up counting the same call 2-3 times across wrapper + provider layers.This PR fixes both.
Trace propagation
llm/fallback_adapter: capture the caller span on construction. On first successful chunk, writegen_ai.request.model/gen_ai.provider.nameback onto the caller span, the innerllm_requestspan, and the run span. Wrap each attempt in anllm_fallback_attemptspan with attempt index + model/provider.tts/fallback_adapter: same caller-span propagation pattern.stt/fallback_adapter:_activeStt(set when a child stream produces events orrecognize()succeeds) and expose it vialabel/model/providergetters so external callers reading the wrapper see the active child.stt_fallback_recognize_attempt/stt_fallback_stream_attemptspans.voice/agent_activity+voice/audio_recognition: thread theSTTinstance intoAudioRecognitionsouser_turnre-readsmodel/provideron each STT event (FallbackAdapter only knows its active child after the first event lands). Idempotent — skipssetAttributeif the value hasn't changed.Cost attribution
voice/generation: capture finalChatChunk.usageand stamp exact prompt/completion tokens on thellm_nodespan, classified asgeneration. This becomes the single billable layer per LLM turn — backends that estimate tokens from prompt text no longer diverge from the provider's own billing.llm/llm+tts/tts: classifyllm_request/tts_requestspans asspan(notgeneration) so wrapper + inner-provider layers aren't counted as separate cost centres. LiveKit Cloud is unaffected —gen_ai.usage.*is still emitted on the inner spans for backends that read it directly. Made_llmRequestSpan/_ttsRequestSpanprotectedso FallbackAdapter subclasses can write through.telemetry/trace_types: add a new observation-type attribute (matches the existing naming convention in this file) plusATTR_FALLBACK_ATTEMPT_INDEX.Verified
End-to-end against a real call:
llm_nodemodelFallbackAdapteruser_turnmodelinference.STTassemblyai/u3-rt-pro)llm_node+ 2xllm_request)llm_nodeonly)Test plan
pnpm build:agentscleanpnpm test— all 29 fallback-adapter tests pass (LLM 9 + STT + TTS)pnpm format:checkclean@livekit/agents@1.2.7before this PR_llmRequestSpan/_ttsRequestSpanexposure — open to a different shape if you'd rather keep them private