Merge pull request #495 from pipecat-ai/mb/improve-user-stop-timeout-example

markbackman · web-flow · commit 12456e3a733e · 2026-01-14T10:23:37.000-05:00
Improve user stop timeout example
diff --git a/getting-started/quickstart.mdx b/getting-started/quickstart.mdx
@@ -268,7 +268,7 @@ The context aggregator automatically collects user messages (after speech-to-tex
 When building web or mobile clients, you can use [Pipecat's client SDKs](/client/introduction) that communicate with your bot via the [RTVI (Real-Time Voice Interaction) protocol](/client/rtvi-standard). In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:
 
 ```python
-rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
+rtvi = RTVIProcessor()
 ```
 
 See below for how we incorporate the RTVI processor into the pipeline.
diff --git a/guides/fundamentals/custom-frame-processor.mdx b/guides/fundamentals/custom-frame-processor.mdx
@@ -34,6 +34,7 @@ class MetricsFrameLogger(FrameProcessor):
 This frame processor looks for `MetricsFrames`. When it sees one, it formats the data and logs it.
 
 It uses this `format_metrics` function:
+
 ```python
 def format_metrics(metrics, indent=0):
     lines = []
@@ -78,6 +79,7 @@ pipeline = Pipeline(
         metrics_frame_processor,  # Our custom FrameProcessor that pretty prints metrics frames
     ]
 )
+```
 
 With this positioning, the `MetricsFrameLogger` FrameProcessor will receive every MetericsFrame in the pipeline.
 
diff --git a/guides/fundamentals/user-input-muting.mdx b/guides/fundamentals/user-input-muting.mdx
@@ -38,7 +38,11 @@ This prevents user speech from being processed during muted periods.
 Pipecat provides several built-in strategies for determining when to mute user input:
 
 <CardGroup cols={2}>
-  <Card title="FirstSpeechUserMuteStrategy" icon="microphone-slash" iconType="duotone">
+  <Card
+    title="FirstSpeechUserMuteStrategy"
+    icon="microphone-slash"
+    iconType="duotone"
+  >
     Mute only during the bot's first speech utterance. Useful for introductions
     when you want the bot to complete its greeting before the user can speak.
   </Card>
@@ -61,8 +65,9 @@ Pipecat provides several built-in strategies for determining when to mute user i
 </CardGroup>
 
 <Warning>
-  The `FirstSpeechUserMuteStrategy` and `MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not
-  be used together as they handle the first bot speech differently.
+  The `FirstSpeechUserMuteStrategy` and
+  `MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not be used
+  together as they handle the first bot speech differently.
 </Warning>
 
 ## Basic Implementation
diff --git a/guides/learn/context-management.mdx b/guides/learn/context-management.mdx
@@ -68,7 +68,10 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
 ```
 
 <Note>
-The context aggregator also supports configuring [user turn strategies](/server/utilities/user-turn-strategies) and [user mute strategies](/server/utilities/user-mute-strategies) via `LLMUserAggregatorParams`.
+  The context aggregator also supports configuring [user turn
+  strategies](/server/utilities/user-turn-strategies) and [user mute
+  strategies](/server/utilities/user-mute-strategies) via
+  `LLMUserAggregatorParams`.
 </Note>
 
 **About LLMContext:**
diff --git a/guides/learn/pipeline.mdx b/guides/learn/pipeline.mdx
@@ -160,8 +160,8 @@ Understanding data flow is crucial for building effective pipelines:
 4. `tts` converts text frames to `TTSAudioRawFrame`s, `AggregatedTextFrame`s, and `TTSTextFrame`s
 5. `transport.output()` creates `OutputAudioRawFrame`s and sends audio back to user
 
-* Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
-out of the TTS.
+- Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
+  out of the TTS.
 
 ### Frame Propagation
 
diff --git a/guides/learn/speech-input.mdx b/guides/learn/speech-input.mdx
@@ -10,11 +10,13 @@ A key to natural conversations is properly detecting when the user starts and st
 Pipecat uses [user turn strategies](/server/utilities/user-turn-strategies) to determine when user turns start and end. These strategies can use different techniques:
 
 **For detecting turn start:**
+
 - Voice Activity Detection (VAD): triggers when speech is detected
 - Transcription-based (fallback): triggers when transcription is received but VAD didn't detect speech
 - Minimum words: waits for a minimum number of spoken words before triggering
 
 **For detecting turn end:**
+
 - Transcription-based: analyzes transcription to determine when the user is done
 - Turn detection model: uses AI to understand if the user has finished their thought
 
@@ -90,7 +92,9 @@ While VAD detects speech vs. silence, it can't understand linguistic context. A
 2. **Turn End**: When the stop strategy determines the user is done, it emits `UserStoppedSpeakingFrame`
 
 <Note>
-VAD also emits its own frames (`VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection. These are inputs to the turn strategies, not the final turn decisions.
+  VAD also emits its own frames (`VADUserStartedSpeakingFrame`,
+  `VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection.
+  These are inputs to the turn strategies, not the final turn decisions.
 </Note>
 
 ### Detecting Turn End
@@ -141,6 +145,7 @@ When using Smart Turn, configure VAD with a low `stop_secs` (0.2) so the model c
 Interruptions stop the bot when the user starts speaking. This is controlled by the `enable_interruptions` parameter on start strategies (enabled by default).
 
 When a user turn starts with interruptions enabled:
+
 1. Bot immediately stops speaking
 2. Pending audio and text is cleared
 3. Pipeline ready for new user input
@@ -154,7 +159,8 @@ start_strategy = VADUserTurnStartStrategy(enable_interruptions=False)
 ```
 
 <Note>
-Keep interruptions enabled (default) for natural conversations. This enables users to interrupt the bot mid-sentence, just like human conversations.
+  Keep interruptions enabled (default) for natural conversations. This enables
+  users to interrupt the bot mid-sentence, just like human conversations.
 </Note>
 
 ## Best Practices
diff --git a/guides/learn/text-to-speech.mdx b/guides/learn/text-to-speech.mdx
@@ -181,7 +181,11 @@ For TTS-specific text preprocessing, you can provide custom text transforms that
 Text transforms are registered directly on the TTS service instance via the `add_text_transformer()` method or during initialization using the `text_transforms` parameter.
 
 <Note>
-  The intentions of text transforms are meant to be TTS-specific modifications that do not affect the underlying LLM text or context. That said, since the context aggregator attempts to base its context on what was actually spoken, for services that support word timestamps, like Cartesia, ElevenLabs, and Rime,these transforms will modify the context as they modify what is spoken.
+  The intentions of text transforms are meant to be TTS-specific modifications
+  that do not affect the underlying LLM text or context. That said, since the
+  context aggregator attempts to base its context on what was actually spoken,
+  for services that support word timestamps, like Cartesia, ElevenLabs, and
+  Rime,these transforms will modify the context as they modify what is spoken.
 </Note>
 
 ```python
@@ -227,7 +231,11 @@ tts.add_text_transformer(replace_acronyms, "*")  # Apply to all text
 
 ### Text Filters
 
-<Warning>Text filters are no longer the preferred method for text preprocessing and will be deprecated in future releases. Instead, you should use one of the methods described above.</Warning>
+<Warning>
+  Text filters are no longer the preferred method for text preprocessing and
+  will be deprecated in future releases. Instead, you should use one of the
+  methods described above.
+</Warning>
 
 Apply preprocessing to text before synthesis:
 
diff --git a/guides/learn/transports.mdx b/guides/learn/transports.mdx
@@ -131,7 +131,9 @@ params = TransportParams(
 </Note>
 
 <Tip>
-  For advanced turn detection (like Smart Turn), configure [User Turn Strategies](/server/utilities/user-turn-strategies) on the context aggregator instead of using the transport's turn_analyzer parameter.
+  For advanced turn detection (like Smart Turn), configure [User Turn
+  Strategies](/server/utilities/user-turn-strategies) on the context aggregator
+  instead of using the transport's turn_analyzer parameter.
 </Tip>
 
 <Card
diff --git a/server/pipeline/pipeline-params.mdx b/server/pipeline/pipeline-params.mdx
@@ -30,10 +30,12 @@ task = PipelineTask(pipeline, params=params)
 
 <ParamField path="allow_interruptions" type="bool" default="False">
   <Warning>
-    DEPRECATED: This parameter is deprecated. Configure interruption behavior via [User Turn Strategies](/server/utilities/user-turn-strategies) instead. See the `enable_interruptions` parameter on start strategies.
+    DEPRECATED: This parameter is deprecated. Configure interruption behavior
+    via [User Turn Strategies](/server/utilities/user-turn-strategies) instead.
+    See the `enable_interruptions` parameter on start strategies.
   </Warning>
-  Whether to allow pipeline interruptions. When enabled, a user's speech will
-  immediately interrupt the bot's response.
+  Whether to allow pipeline interruptions. When enabled, a user's speech will immediately
+  interrupt the bot's response.
 </ParamField>
 
 <ParamField path="audio_in_sample_rate" type="int" default="16000">
diff --git a/server/pipeline/pipeline-task.mdx b/server/pipeline/pipeline-task.mdx
@@ -80,19 +80,27 @@ await runner.run(task)
 </ParamField>
 
 <ParamField path="enable_tracing" type="bool" default="False">
-  Whether to enable OpenTelemetry tracing. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
+  Whether to enable OpenTelemetry tracing. See [The OpenTelemetry
+  guide](/server/utilities/opentelemetry) for details.
 </ParamField>
 
 <ParamField path="enable_turn_tracking" type="bool" default="False">
-  Whether to enable turn tracking. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
+  Whether to enable turn tracking. See [The OpenTelemetry
+  guide](/server/utilities/opentelemetry) for details.
 </ParamField>
 
 <ParamField path="conversation_id" type="Optional[str]" default="None">
-  Custom ID for the conversation. If not provided, a UUID will be generated. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
+  Custom ID for the conversation. If not provided, a UUID will be generated. See
+  [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
 </ParamField>
 
-<ParamField path="additional_span_attributes" type="Optional[dict]" default="None">
-  Any additional attributes to add to top-level OpenTelemetry conversation span. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
+<ParamField
+  path="additional_span_attributes"
+  type="Optional[dict]"
+  default="None"
+>
+  Any additional attributes to add to top-level OpenTelemetry conversation span.
+  See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
 </ParamField>
 
 ## Methods
diff --git a/server/utilities/external-turn-management.mdx b/server/utilities/external-turn-management.mdx
@@ -8,6 +8,7 @@ description: "Handle turn detection externally using UserTurnProcessor or extern
 In some scenarios, turn detection happens externally, either through a dedicated processor or an external service. Pipecat provides `ExternalUserTurnStrategies`, a [user turn strategy](/server/utilities/user-turn-strategies) that defers turn handling to these external sources.
 
 External turn management might be needed when:
+
 - **Multiple context aggregators**: Parallel pipelines with multiple LLMs need a single, shared source of turn events
 - **External services with turn detection**: Services like [Deepgram Flux](/server/services/stt/deepgram) or [Speechmatics](/server/services/stt/speechmatics) provide their own turn detection
 
@@ -38,17 +39,25 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
 `UserTurnProcessor` is a frame processor for managing user turn lifecycle when you need a single source of turn events shared across multiple context aggregators. It emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and handles interruptions.
 
 <Note>
-`UserTurnProcessor` only manages user turn start and end events. It does not handle transcription aggregation, that remains the responsibility of the context aggregators.
+  `UserTurnProcessor` only manages user turn start and end events. It does not
+  handle transcription aggregation, that remains the responsibility of the
+  context aggregators.
 </Note>
 
 ### Constructor Parameters
 
-<ParamField path="user_turn_strategies" type="UserTurnStrategies" default="UserTurnStrategies()">
-  Configured strategies for starting and stopping user turns. See [User Turn Strategies](/server/utilities/user-turn-strategies) for available options.
+<ParamField
+  path="user_turn_strategies"
+  type="UserTurnStrategies"
+  default="UserTurnStrategies()"
+>
+  Configured strategies for starting and stopping user turns. See [User Turn
+  Strategies](/server/utilities/user-turn-strategies) for available options.
 </ParamField>
 
 <ParamField path="user_turn_stop_timeout" type="float" default="5.0">
-  Timeout in seconds to automatically stop a user turn if no stop strategy triggers.
+  Timeout in seconds to automatically stop a user turn if no stop strategy
+  triggers.
 </ParamField>
 
 ### Event Handlers
diff --git a/server/utilities/interruption-strategies.mdx b/server/utilities/interruption-strategies.mdx
@@ -3,7 +3,10 @@ title: "Interruption Strategies"
 description: "Configure when users can interrupt the bot to prevent unwanted interruptions from brief affirmations"
 ---
 
-<Warning>DEPRECATED Interruption strategies have been deprecated in favor of [User Turn Strategies](/server/utilities/user-turn-strategies).</Warning>
+<Warning>
+  DEPRECATED Interruption strategies have been deprecated in favor of [User Turn
+  Strategies](/server/utilities/user-turn-strategies).
+</Warning>
 
 ## Overview
 
diff --git a/server/utilities/smart-turn/fal-smart-turn.mdx b/server/utilities/smart-turn/fal-smart-turn.mdx
@@ -3,7 +3,12 @@ title: "Fal Smart Turn"
 description: "Cloud-hosted Smart Turn detection using Fal.ai"
 ---
 
-<Warning> DEPRECATED: `FalSmartTurnAnalyzer` is deprecated. Please use [LocalSmartTurnAnalyzerV3](/server/utilities/smart-turn/smart-turn-overview#local-smart-turn) instead, which provides fast CPU inference without requiring external API calls. </Warning>
+<Warning>
+  DEPRECATED: `FalSmartTurnAnalyzer` is deprecated. Please use
+  [LocalSmartTurnAnalyzerV3](/server/utilities/smart-turn/smart-turn-overview#local-smart-turn)
+  instead, which provides fast CPU inference without requiring external API
+  calls.
+</Warning>
 
 ## Overview
 
diff --git a/server/utilities/smart-turn/smart-turn-overview.mdx b/server/utilities/smart-turn/smart-turn-overview.mdx
@@ -120,8 +120,9 @@ The `LocalSmartTurnAnalyzerV3` runs inference locally. Version 3 of the model su
   Path to the Smart Turn v3 ONNX file containing the model weights. Download this from
   https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main
 
-  This parameter is optional, as Pipecat includes a copy of the model internally, and this
-  is used if the path is unset.
+This parameter is optional, as Pipecat includes a copy of the model internally, and this
+is used if the path is unset.
+
 </ParamField>
 
 <ParamField path="sample_rate" type="Optional[int]" default="None">
@@ -169,15 +170,12 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
 )
 ```
 
-
 ## How It Works
 
 Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points:
 
 1. **Audio Buffering**: The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio.
-
 2. **VAD Processing**: Voice Activity Detection (using the Silero model) detects when there is a pause in the user's speech.
-
 3. **Smart Turn Analysis**: When VAD detects a pause in speech, the Smart Turn model analyzes the audio from the most recent 8 seconds of the user's turn, and makes a decision about whether the turn is complete or incomplete.
 
 The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than `stop_secs`, the turn is automatically marked as complete.
diff --git a/server/utilities/transcript-processor.mdx b/server/utilities/transcript-processor.mdx
@@ -3,7 +3,12 @@ title: "TranscriptProcessor"
 description: "Factory for creating and managing conversation transcript processors with shared event handling"
 ---
 
-<Warning> DEPRECATED: TranscriptProcessor has been deprecated. Use `on_user_turn_stopped` and `on_assistant_turn_stopped` events on the context aggregators to collect transcriptions, see [Transcriptions](/server/utilities/transcriptions) for details.  </Warning>
+<Warning>
+  DEPRECATED: TranscriptProcessor has been deprecated. Use
+  `on_user_turn_stopped` and `on_assistant_turn_stopped` events on the context
+  aggregators to collect transcriptions, see
+  [Transcriptions](/server/utilities/transcriptions) for details.
+</Warning>
 
 ## Overview
 
diff --git a/server/utilities/transcriptions.mdx b/server/utilities/transcriptions.mdx
@@ -8,6 +8,7 @@ description: "Collect user and assistant conversation transcripts using turn eve
 Pipecat provides a straightforward way to collect conversation transcriptions using [turn events](/server/utilities/turn-events). When a user or assistant turn ends, the corresponding event includes the complete transcript for that turn.
 
 The key events for transcription collection are:
+
 - **`on_user_turn_stopped`** - Provides the user's complete transcript via `UserTurnStoppedMessage`
 - **`on_assistant_turn_stopped`** - Provides the assistant's complete transcript via `AssistantTurnStoppedMessage`
 
diff --git a/server/utilities/turn-events.mdx b/server/utilities/turn-events.mdx
diff --git a/server/utilities/user-mute-strategies.mdx b/server/utilities/user-mute-strategies.mdx
diff --git a/server/utilities/user-turn-strategies.mdx b/server/utilities/user-turn-strategies.mdx

Original file line number	Diff line number	Diff line change
`@@ -34,6 +34,7 @@ class MetricsFrameLogger(FrameProcessor):`
`34`	`34`	This frame processor looks for `MetricsFrames`. When it sees one, it formats the data and logs it.
`35`	`35`
`36`	`36`	It uses this `format_metrics` function:
	`37`	`+`
`37`	`38`	```python
`38`	`39`	`def format_metrics(metrics, indent=0):`
`39`	`40`	`lines = []`
`@@ -78,6 +79,7 @@ pipeline = Pipeline(`
`78`	`79`	`metrics_frame_processor, # Our custom FrameProcessor that pretty prints metrics frames`
`79`	`80`	`]`
`80`	`81`	`)`
	`82`	+```
`81`	`83`
`82`	`84`	With this positioning, the `MetricsFrameLogger` FrameProcessor will receive every MetericsFrame in the pipeline.
`83`	`85`