Skip to content

implement google live api for live session navid shad #86etkgdqu#34

Merged
navidshad merged 19 commits into
devfrom
CU-86etkgdqu_Implement-Google-live-api-for-live-session_Navid-Shad
May 8, 2026
Merged

implement google live api for live session navid shad #86etkgdqu#34
navidshad merged 19 commits into
devfrom
CU-86etkgdqu_Implement-Google-live-api-for-live-session_Navid-Shad

Conversation

@navidshad
Copy link
Copy Markdown
Contributor

@navidshad navidshad commented May 8, 2026

🏷️ PR Title:
Enhance Gemini Live Sessions with Improved UI, Language Support, and Token Tracking

📋 Summary

This PR introduces comprehensive improvements to the Gemini live session experience, including a redesigned UI with enhanced phrase cards and chat transcript, native language selection, and better session state management. It adds a text-based message composer with microphone toggle fallback and microphone level visualization. The update also includes advanced Gemini token tracking for tools, thoughts, and video metrics, as well as refined message handling and granular dialog persistence. Localization enhancements using i18n placeholders and UI layout fixes are incorporated. Additionally, the OpenAI live-session flow is disabled and detached, with the default live session route set to Gemini. The price calculation logic is centralized and updated to apply cost markup for Gemini pricing models. Obsolete Instagram connection and status pages are removed, and overall code cleanup and documentation improvements are applied.

🔗 Related Tasks

#86etkgdqu - Implement Gemini Live API for live sessions, improve UI and session state, add native language selection, mic level visualization, token tracking, and text input enhancements; disable OpenAI live session flow and clean up related code.

📝 Additional Details

  • Phrase marking is now restricted to user input instead of trigger actions.
  • Added spacebar toggle and translation-masking mode in live sessions.
  • Fixed phrase card layout with fixed widths, text truncation, and hoverable titles.
  • Split live-session codebase by provider into separate OpenAI and Gemini sub-modules for better maintainability.
  • Improved Gemini live session message handling to support more granular dialog persistence.
  • Centralized pricing logic to support Gemini cost markup models.

📜 Commit List

  • f8650de fix: restrict phrase marking as practiced to user input instead of trigger actions #86etkgdqu
  • de37822 feat: add opt-in text-based message composer with microphone toggle fallback #86etkgdqu
  • 9b8a85f feat: reset session state on creation and improve native language detection for live sessions #86etkgdqu
  • a0b77ab feat: add native language selection for AI sessions and implement text input for live practice #86etkgdqu
  • 59b81f4 refactor: localize UI text and labels in live session components using i18n placeholders #86etkgdqu
  • 042e67f feat: add mic level visualization and spacebar toggle, and include translation-masking mode in live sessions #86etkgdqu
  • bd02253 style: fix phrase card layout with fixed widths, text truncation, and hoverable titles #86etkgdqu
  • 3b89c48 refactor: redesign live session UI with improved phrase cards, chat transcript, and session state tracking #86etkgdqu
  • 8cebf65 feat: update Gemini token tracking to include tool use, thoughts, and dedicated video token metrics #86etkgdqu
  • 33038ae refactor: improve Gemini live session message handling and granular dialog persistence #86etkgdqu
  • 6002531 chore: tweak on functions files
  • e444d70 chore: renamed openai store
  • 847486f chore: remove obsolete instagram connection and status pages
  • 119e5d8 chore: #86etkgdqu detach OpenAI flow, default live-session route to Gemini
  • 61b8208 chore: #86etkgdqu disable OpenAI live-session flow on the frontend
  • 161a413 refactor: centralize price calculation logic and apply cost markup to Gemini pricing models #86etkgdqu
  • bbb1596 refactor: #86etkgdqu split live-session by provider into openai/ and gemini/ sub-modules
  • 5294d50 chore: #86etkgdqu document and clean up Gemini live-session code
  • fbf147b feat: #86etkgdqu implement Gemini Live API for live sessions

navidshad and others added 19 commits May 7, 2026 20:02
Replaces the OpenAI Realtime entry point on the bundle page with a Gemini
`gemini-3.1-flash-live-preview` flow using the official `@google/genai` SDK
(server-issued ephemeral tokens, browser `ai.live.connect`, AudioWorklet PCM16
mic capture, queued AudioBufferSourceNode playback, auto-resume across the
15-min cap, and a `provider`-aware `cost` virtual so historical OpenAI session
records still price correctly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds module/function-level JSDoc to the Gemini store, audio worklet, server
ephemeral-token function, and the Gemini-bound mic toggle; trims
debug-cycle console logs (errors and warnings stay), tightens dialog and
phrase-selection helpers in the practice page, and fixes a precedence bug
in the server-side error message (': ' + msg || String(error)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gemini/ sub-modules

Server: provider-specific config, types, utils, and ephemeral-token
functions move under `live_session/{openai,gemini}/`. The parent
`functions.ts` keeps the shared create/update functions and aggregates the
provider modules into the exported function list. `types.ts` becomes a
barrel re-exporting both providers' types so the frontend's existing
`~/types/live-session.type` import keeps working.

Frontend: `components/liveSession/StartNew.vue` is split into
`liveSession/gemini/StartNew.vue` (current behavior) and
`liveSession/openai/StartNew.vue`. `StartLiveSessionForm` accepts a
`voiceOptions` prop (defaulting to Gemini voices) so each variant can pass
its own provider-specific voice list. `pages/sessions/new.vue` defaults to
the Gemini variant; switching to OpenAI is a single component swap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes every frontend runtime path to the OpenAI Realtime backend:

  - Deletes the OpenAI practice page (`pages/practice/live-session-[id].vue`),
    the OpenAI Pinia store, the OpenAI-bound mic toggle, and the OpenAI
    `StartNew` variant.
  - Rewires `FreemiumTimer` (the only component still importing the OpenAI
    store) to the Gemini store so timer-expiry auto-mute keeps working.
  - Trims stale comments that referenced the now-removed files.

Server-side `live_session/openai/` stays in place so historical OpenAI
session records continue to price correctly via the provider-aware `cost`
virtual; only the frontend exposure is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…emini

Re-attaches the OpenAI source files we removed last commit so the logic
stays in the codebase, but disconnects them from the runtime:

  - Gemini is now the default. `pages/practice/live-session-[id].vue` is
    the Gemini practice page (moved out of `live-session-gemini/`); bundle
    page and Gemini StartNew route to `/practice/live-session-{id}`.
  - OpenAI lives at `pages/practice/live-session-openai/[id].vue` (folder
    sub-path on purpose — vue-router can't disambiguate
    `live-session-:id` and `live-session-openai-:id` since they score
    equally and the alphabetically-earlier route wins, so the slash
    separator is required). Its `OPENAI_DISABLED = true` constant
    short-circuits `onMounted` and the template renders a disabled
    notice, so even direct URL visits make zero API calls.
  - OpenAI StartNew variant routes to the new
    `/practice/live-session-openai/{id}` URL; not currently rendered by
    any page but kept for reference.
  - `StartLiveSessionForm` had a `withDefaults` bug where the defaults
    factory referenced a local `const` — Vue compiler hoists `withDefaults`
    out of script-setup scope, so this broke compilation. Inlined the
    voice list literal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ranscript, and session state tracking #86etkgdqu
…anslation-masking mode in live sessions #86etkgdqu
@navidshad
Copy link
Copy Markdown
Contributor Author

@navidshad navidshad changed the title Cu 86etkgdqu implement google live api for live session navid shad implement google live api for live session navid shad #86etkgdqu May 8, 2026
@navidshad navidshad merged commit e20138e into dev May 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant