Skip to content

Commit 12456e3

Browse files
authored
Merge pull request #495 from pipecat-ai/mb/improve-user-stop-timeout-example
Improve user stop timeout example
2 parents 9a12215 + c44f6f1 commit 12456e3

19 files changed

Lines changed: 176 additions & 76 deletions

getting-started/quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ The context aggregator automatically collects user messages (after speech-to-tex
268268
When building web or mobile clients, you can use [Pipecat's client SDKs](/client/introduction) that communicate with your bot via the [RTVI (Real-Time Voice Interaction) protocol](/client/rtvi-standard). In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:
269269

270270
```python
271-
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
271+
rtvi = RTVIProcessor()
272272
```
273273

274274
See below for how we incorporate the RTVI processor into the pipeline.

guides/fundamentals/custom-frame-processor.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ class MetricsFrameLogger(FrameProcessor):
3434
This frame processor looks for `MetricsFrames`. When it sees one, it formats the data and logs it.
3535

3636
It uses this `format_metrics` function:
37+
3738
```python
3839
def format_metrics(metrics, indent=0):
3940
lines = []
@@ -78,6 +79,7 @@ pipeline = Pipeline(
7879
metrics_frame_processor, # Our custom FrameProcessor that pretty prints metrics frames
7980
]
8081
)
82+
```
8183

8284
With this positioning, the `MetricsFrameLogger` FrameProcessor will receive every MetericsFrame in the pipeline.
8385

guides/fundamentals/user-input-muting.mdx

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,11 @@ This prevents user speech from being processed during muted periods.
3838
Pipecat provides several built-in strategies for determining when to mute user input:
3939

4040
<CardGroup cols={2}>
41-
<Card title="FirstSpeechUserMuteStrategy" icon="microphone-slash" iconType="duotone">
41+
<Card
42+
title="FirstSpeechUserMuteStrategy"
43+
icon="microphone-slash"
44+
iconType="duotone"
45+
>
4246
Mute only during the bot's first speech utterance. Useful for introductions
4347
when you want the bot to complete its greeting before the user can speak.
4448
</Card>
@@ -61,8 +65,9 @@ Pipecat provides several built-in strategies for determining when to mute user i
6165
</CardGroup>
6266

6367
<Warning>
64-
The `FirstSpeechUserMuteStrategy` and `MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not
65-
be used together as they handle the first bot speech differently.
68+
The `FirstSpeechUserMuteStrategy` and
69+
`MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not be used
70+
together as they handle the first bot speech differently.
6671
</Warning>
6772

6873
## Basic Implementation

guides/learn/context-management.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,10 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
6868
```
6969

7070
<Note>
71-
The context aggregator also supports configuring [user turn strategies](/server/utilities/user-turn-strategies) and [user mute strategies](/server/utilities/user-mute-strategies) via `LLMUserAggregatorParams`.
71+
The context aggregator also supports configuring [user turn
72+
strategies](/server/utilities/user-turn-strategies) and [user mute
73+
strategies](/server/utilities/user-mute-strategies) via
74+
`LLMUserAggregatorParams`.
7275
</Note>
7376

7477
**About LLMContext:**

guides/learn/pipeline.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,8 @@ Understanding data flow is crucial for building effective pipelines:
160160
4. `tts` converts text frames to `TTSAudioRawFrame`s, `AggregatedTextFrame`s, and `TTSTextFrame`s
161161
5. `transport.output()` creates `OutputAudioRawFrame`s and sends audio back to user
162162

163-
* Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
164-
out of the TTS.
163+
- Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
164+
out of the TTS.
165165

166166
### Frame Propagation
167167

guides/learn/speech-input.mdx

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,13 @@ A key to natural conversations is properly detecting when the user starts and st
1010
Pipecat uses [user turn strategies](/server/utilities/user-turn-strategies) to determine when user turns start and end. These strategies can use different techniques:
1111

1212
**For detecting turn start:**
13+
1314
- Voice Activity Detection (VAD): triggers when speech is detected
1415
- Transcription-based (fallback): triggers when transcription is received but VAD didn't detect speech
1516
- Minimum words: waits for a minimum number of spoken words before triggering
1617

1718
**For detecting turn end:**
19+
1820
- Transcription-based: analyzes transcription to determine when the user is done
1921
- Turn detection model: uses AI to understand if the user has finished their thought
2022

@@ -90,7 +92,9 @@ While VAD detects speech vs. silence, it can't understand linguistic context. A
9092
2. **Turn End**: When the stop strategy determines the user is done, it emits `UserStoppedSpeakingFrame`
9193

9294
<Note>
93-
VAD also emits its own frames (`VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection. These are inputs to the turn strategies, not the final turn decisions.
95+
VAD also emits its own frames (`VADUserStartedSpeakingFrame`,
96+
`VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection.
97+
These are inputs to the turn strategies, not the final turn decisions.
9498
</Note>
9599

96100
### Detecting Turn End
@@ -141,6 +145,7 @@ When using Smart Turn, configure VAD with a low `stop_secs` (0.2) so the model c
141145
Interruptions stop the bot when the user starts speaking. This is controlled by the `enable_interruptions` parameter on start strategies (enabled by default).
142146

143147
When a user turn starts with interruptions enabled:
148+
144149
1. Bot immediately stops speaking
145150
2. Pending audio and text is cleared
146151
3. Pipeline ready for new user input
@@ -154,7 +159,8 @@ start_strategy = VADUserTurnStartStrategy(enable_interruptions=False)
154159
```
155160

156161
<Note>
157-
Keep interruptions enabled (default) for natural conversations. This enables users to interrupt the bot mid-sentence, just like human conversations.
162+
Keep interruptions enabled (default) for natural conversations. This enables
163+
users to interrupt the bot mid-sentence, just like human conversations.
158164
</Note>
159165

160166
## Best Practices

guides/learn/text-to-speech.mdx

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,11 @@ For TTS-specific text preprocessing, you can provide custom text transforms that
181181
Text transforms are registered directly on the TTS service instance via the `add_text_transformer()` method or during initialization using the `text_transforms` parameter.
182182

183183
<Note>
184-
The intentions of text transforms are meant to be TTS-specific modifications that do not affect the underlying LLM text or context. That said, since the context aggregator attempts to base its context on what was actually spoken, for services that support word timestamps, like Cartesia, ElevenLabs, and Rime,these transforms will modify the context as they modify what is spoken.
184+
The intentions of text transforms are meant to be TTS-specific modifications
185+
that do not affect the underlying LLM text or context. That said, since the
186+
context aggregator attempts to base its context on what was actually spoken,
187+
for services that support word timestamps, like Cartesia, ElevenLabs, and
188+
Rime,these transforms will modify the context as they modify what is spoken.
185189
</Note>
186190

187191
```python
@@ -227,7 +231,11 @@ tts.add_text_transformer(replace_acronyms, "*") # Apply to all text
227231

228232
### Text Filters
229233

230-
<Warning>Text filters are no longer the preferred method for text preprocessing and will be deprecated in future releases. Instead, you should use one of the methods described above.</Warning>
234+
<Warning>
235+
Text filters are no longer the preferred method for text preprocessing and
236+
will be deprecated in future releases. Instead, you should use one of the
237+
methods described above.
238+
</Warning>
231239

232240
Apply preprocessing to text before synthesis:
233241

guides/learn/transports.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,9 @@ params = TransportParams(
131131
</Note>
132132

133133
<Tip>
134-
For advanced turn detection (like Smart Turn), configure [User Turn Strategies](/server/utilities/user-turn-strategies) on the context aggregator instead of using the transport's turn_analyzer parameter.
134+
For advanced turn detection (like Smart Turn), configure [User Turn
135+
Strategies](/server/utilities/user-turn-strategies) on the context aggregator
136+
instead of using the transport's turn_analyzer parameter.
135137
</Tip>
136138

137139
<Card

server/pipeline/pipeline-params.mdx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,12 @@ task = PipelineTask(pipeline, params=params)
3030

3131
<ParamField path="allow_interruptions" type="bool" default="False">
3232
<Warning>
33-
DEPRECATED: This parameter is deprecated. Configure interruption behavior via [User Turn Strategies](/server/utilities/user-turn-strategies) instead. See the `enable_interruptions` parameter on start strategies.
33+
DEPRECATED: This parameter is deprecated. Configure interruption behavior
34+
via [User Turn Strategies](/server/utilities/user-turn-strategies) instead.
35+
See the `enable_interruptions` parameter on start strategies.
3436
</Warning>
35-
Whether to allow pipeline interruptions. When enabled, a user's speech will
36-
immediately interrupt the bot's response.
37+
Whether to allow pipeline interruptions. When enabled, a user's speech will immediately
38+
interrupt the bot's response.
3739
</ParamField>
3840

3941
<ParamField path="audio_in_sample_rate" type="int" default="16000">

server/pipeline/pipeline-task.mdx

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,19 +80,27 @@ await runner.run(task)
8080
</ParamField>
8181

8282
<ParamField path="enable_tracing" type="bool" default="False">
83-
Whether to enable OpenTelemetry tracing. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
83+
Whether to enable OpenTelemetry tracing. See [The OpenTelemetry
84+
guide](/server/utilities/opentelemetry) for details.
8485
</ParamField>
8586

8687
<ParamField path="enable_turn_tracking" type="bool" default="False">
87-
Whether to enable turn tracking. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
88+
Whether to enable turn tracking. See [The OpenTelemetry
89+
guide](/server/utilities/opentelemetry) for details.
8890
</ParamField>
8991

9092
<ParamField path="conversation_id" type="Optional[str]" default="None">
91-
Custom ID for the conversation. If not provided, a UUID will be generated. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
93+
Custom ID for the conversation. If not provided, a UUID will be generated. See
94+
[The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
9295
</ParamField>
9396

94-
<ParamField path="additional_span_attributes" type="Optional[dict]" default="None">
95-
Any additional attributes to add to top-level OpenTelemetry conversation span. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
97+
<ParamField
98+
path="additional_span_attributes"
99+
type="Optional[dict]"
100+
default="None"
101+
>
102+
Any additional attributes to add to top-level OpenTelemetry conversation span.
103+
See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
96104
</ParamField>
97105

98106
## Methods

0 commit comments

Comments
 (0)