You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: getting-started/quickstart.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -268,7 +268,7 @@ The context aggregator automatically collects user messages (after speech-to-tex
268
268
When building web or mobile clients, you can use [Pipecat's client SDKs](/client/introduction) that communicate with your bot via the [RTVI (Real-Time Voice Interaction) protocol](/client/rtvi-standard). In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:
The context aggregator also supports configuring [user turn strategies](/server/utilities/user-turn-strategies) and [user mute strategies](/server/utilities/user-mute-strategies) via `LLMUserAggregatorParams`.
71
+
The context aggregator also supports configuring [user turn
72
+
strategies](/server/utilities/user-turn-strategies) and [user mute
73
+
strategies](/server/utilities/user-mute-strategies) via
Copy file name to clipboardExpand all lines: guides/learn/pipeline.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,8 +160,8 @@ Understanding data flow is crucial for building effective pipelines:
160
160
4.`tts` converts text frames to `TTSAudioRawFrame`s, `AggregatedTextFrame`s, and `TTSTextFrame`s
161
161
5.`transport.output()` creates `OutputAudioRawFrame`s and sends audio back to user
162
162
163
-
* Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
164
-
out of the TTS.
163
+
- Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
Copy file name to clipboardExpand all lines: guides/learn/speech-input.mdx
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,11 +10,13 @@ A key to natural conversations is properly detecting when the user starts and st
10
10
Pipecat uses [user turn strategies](/server/utilities/user-turn-strategies) to determine when user turns start and end. These strategies can use different techniques:
11
11
12
12
**For detecting turn start:**
13
+
13
14
- Voice Activity Detection (VAD): triggers when speech is detected
14
15
- Transcription-based (fallback): triggers when transcription is received but VAD didn't detect speech
15
16
- Minimum words: waits for a minimum number of spoken words before triggering
16
17
17
18
**For detecting turn end:**
19
+
18
20
- Transcription-based: analyzes transcription to determine when the user is done
19
21
- Turn detection model: uses AI to understand if the user has finished their thought
20
22
@@ -90,7 +92,9 @@ While VAD detects speech vs. silence, it can't understand linguistic context. A
90
92
2.**Turn End**: When the stop strategy determines the user is done, it emits `UserStoppedSpeakingFrame`
91
93
92
94
<Note>
93
-
VAD also emits its own frames (`VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection. These are inputs to the turn strategies, not the final turn decisions.
95
+
VAD also emits its own frames (`VADUserStartedSpeakingFrame`,
96
+
`VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection.
97
+
These are inputs to the turn strategies, not the final turn decisions.
94
98
</Note>
95
99
96
100
### Detecting Turn End
@@ -141,6 +145,7 @@ When using Smart Turn, configure VAD with a low `stop_secs` (0.2) so the model c
141
145
Interruptions stop the bot when the user starts speaking. This is controlled by the `enable_interruptions` parameter on start strategies (enabled by default).
142
146
143
147
When a user turn starts with interruptions enabled:
Copy file name to clipboardExpand all lines: guides/learn/text-to-speech.mdx
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -181,7 +181,11 @@ For TTS-specific text preprocessing, you can provide custom text transforms that
181
181
Text transforms are registered directly on the TTS service instance via the `add_text_transformer()` method or during initialization using the `text_transforms` parameter.
182
182
183
183
<Note>
184
-
The intentions of text transforms are meant to be TTS-specific modifications that do not affect the underlying LLM text or context. That said, since the context aggregator attempts to base its context on what was actually spoken, for services that support word timestamps, like Cartesia, ElevenLabs, and Rime,these transforms will modify the context as they modify what is spoken.
184
+
The intentions of text transforms are meant to be TTS-specific modifications
185
+
that do not affect the underlying LLM text or context. That said, since the
186
+
context aggregator attempts to base its context on what was actually spoken,
187
+
for services that support word timestamps, like Cartesia, ElevenLabs, and
188
+
Rime,these transforms will modify the context as they modify what is spoken.
185
189
</Note>
186
190
187
191
```python
@@ -227,7 +231,11 @@ tts.add_text_transformer(replace_acronyms, "*") # Apply to all text
227
231
228
232
### Text Filters
229
233
230
-
<Warning>Text filters are no longer the preferred method for text preprocessing and will be deprecated in future releases. Instead, you should use one of the methods described above.</Warning>
234
+
<Warning>
235
+
Text filters are no longer the preferred method for text preprocessing and
236
+
will be deprecated in future releases. Instead, you should use one of the
Copy file name to clipboardExpand all lines: guides/learn/transports.mdx
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,7 +131,9 @@ params = TransportParams(
131
131
</Note>
132
132
133
133
<Tip>
134
-
For advanced turn detection (like Smart Turn), configure [User Turn Strategies](/server/utilities/user-turn-strategies) on the context aggregator instead of using the transport's turn_analyzer parameter.
134
+
For advanced turn detection (like Smart Turn), configure [User Turn
135
+
Strategies](/server/utilities/user-turn-strategies) on the context aggregator
136
+
instead of using the transport's turn_analyzer parameter.
DEPRECATED: This parameter is deprecated. Configure interruption behavior via [User Turn Strategies](/server/utilities/user-turn-strategies) instead. See the `enable_interruptions` parameter on start strategies.
33
+
DEPRECATED: This parameter is deprecated. Configure interruption behavior
34
+
via [User Turn Strategies](/server/utilities/user-turn-strategies) instead.
35
+
See the `enable_interruptions` parameter on start strategies.
34
36
</Warning>
35
-
Whether to allow pipeline interruptions. When enabled, a user's speech will
36
-
immediately interrupt the bot's response.
37
+
Whether to allow pipeline interruptions. When enabled, a user's speech will immediately
Any additional attributes to add to top-level OpenTelemetry conversation span. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
97
+
<ParamField
98
+
path="additional_span_attributes"
99
+
type="Optional[dict]"
100
+
default="None"
101
+
>
102
+
Any additional attributes to add to top-level OpenTelemetry conversation span.
103
+
See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
0 commit comments