Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ The context aggregator automatically collects user messages (after speech-to-tex
When building web or mobile clients, you can use [Pipecat's client SDKs](/client/introduction) that communicate with your bot via the [RTVI (Real-Time Voice Interaction) protocol](/client/rtvi-standard). In our quickstart example, we initialize the RTVI processor to handle client-server messaging and events:

```python
rtvi = RTVIProcessor(config=RTVIConfig(config=[]))
rtvi = RTVIProcessor()
```

See below for how we incorporate the RTVI processor into the pipeline.
Expand Down
2 changes: 2 additions & 0 deletions guides/fundamentals/custom-frame-processor.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ class MetricsFrameLogger(FrameProcessor):
This frame processor looks for `MetricsFrames`. When it sees one, it formats the data and logs it.

It uses this `format_metrics` function:

```python
def format_metrics(metrics, indent=0):
lines = []
Expand Down Expand Up @@ -78,6 +79,7 @@ pipeline = Pipeline(
metrics_frame_processor, # Our custom FrameProcessor that pretty prints metrics frames
]
)
```

With this positioning, the `MetricsFrameLogger` FrameProcessor will receive every MetericsFrame in the pipeline.

Expand Down
11 changes: 8 additions & 3 deletions guides/fundamentals/user-input-muting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,11 @@ This prevents user speech from being processed during muted periods.
Pipecat provides several built-in strategies for determining when to mute user input:

<CardGroup cols={2}>
<Card title="FirstSpeechUserMuteStrategy" icon="microphone-slash" iconType="duotone">
<Card
title="FirstSpeechUserMuteStrategy"
icon="microphone-slash"
iconType="duotone"
>
Mute only during the bot's first speech utterance. Useful for introductions
when you want the bot to complete its greeting before the user can speak.
</Card>
Expand All @@ -61,8 +65,9 @@ Pipecat provides several built-in strategies for determining when to mute user i
</CardGroup>

<Warning>
The `FirstSpeechUserMuteStrategy` and `MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not
be used together as they handle the first bot speech differently.
The `FirstSpeechUserMuteStrategy` and
`MuteUntilFirstBotCompleteUserMuteStrategy` strategies should not be used
together as they handle the first bot speech differently.
</Warning>

## Basic Implementation
Expand Down
5 changes: 4 additions & 1 deletion guides/learn/context-management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,10 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
```

<Note>
The context aggregator also supports configuring [user turn strategies](/server/utilities/user-turn-strategies) and [user mute strategies](/server/utilities/user-mute-strategies) via `LLMUserAggregatorParams`.
The context aggregator also supports configuring [user turn
strategies](/server/utilities/user-turn-strategies) and [user mute
strategies](/server/utilities/user-mute-strategies) via
`LLMUserAggregatorParams`.
</Note>

**About LLMContext:**
Expand Down
4 changes: 2 additions & 2 deletions guides/learn/pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,8 @@ Understanding data flow is crucial for building effective pipelines:
4. `tts` converts text frames to `TTSAudioRawFrame`s, `AggregatedTextFrame`s, and `TTSTextFrame`s
5. `transport.output()` creates `OutputAudioRawFrame`s and sends audio back to user

* Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
out of the TTS.
- Note: An `LLMTextProcessor` can sit between the `llm` and `tts` to pre-aggregate `LLMTextFrame`s into `AggregatedTextFrame`s. This simply moves the aggregation step
out of the TTS.

### Frame Propagation

Expand Down
10 changes: 8 additions & 2 deletions guides/learn/speech-input.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ A key to natural conversations is properly detecting when the user starts and st
Pipecat uses [user turn strategies](/server/utilities/user-turn-strategies) to determine when user turns start and end. These strategies can use different techniques:

**For detecting turn start:**

- Voice Activity Detection (VAD): triggers when speech is detected
- Transcription-based (fallback): triggers when transcription is received but VAD didn't detect speech
- Minimum words: waits for a minimum number of spoken words before triggering

**For detecting turn end:**

- Transcription-based: analyzes transcription to determine when the user is done
- Turn detection model: uses AI to understand if the user has finished their thought

Expand Down Expand Up @@ -90,7 +92,9 @@ While VAD detects speech vs. silence, it can't understand linguistic context. A
2. **Turn End**: When the stop strategy determines the user is done, it emits `UserStoppedSpeakingFrame`

<Note>
VAD also emits its own frames (`VADUserStartedSpeakingFrame`, `VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection. These are inputs to the turn strategies, not the final turn decisions.
VAD also emits its own frames (`VADUserStartedSpeakingFrame`,
`VADUserStoppedSpeakingFrame`) which indicate raw speech/silence detection.
These are inputs to the turn strategies, not the final turn decisions.
</Note>

### Detecting Turn End
Expand Down Expand Up @@ -141,6 +145,7 @@ When using Smart Turn, configure VAD with a low `stop_secs` (0.2) so the model c
Interruptions stop the bot when the user starts speaking. This is controlled by the `enable_interruptions` parameter on start strategies (enabled by default).

When a user turn starts with interruptions enabled:

1. Bot immediately stops speaking
2. Pending audio and text is cleared
3. Pipeline ready for new user input
Expand All @@ -154,7 +159,8 @@ start_strategy = VADUserTurnStartStrategy(enable_interruptions=False)
```

<Note>
Keep interruptions enabled (default) for natural conversations. This enables users to interrupt the bot mid-sentence, just like human conversations.
Keep interruptions enabled (default) for natural conversations. This enables
users to interrupt the bot mid-sentence, just like human conversations.
</Note>

## Best Practices
Expand Down
12 changes: 10 additions & 2 deletions guides/learn/text-to-speech.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,11 @@ For TTS-specific text preprocessing, you can provide custom text transforms that
Text transforms are registered directly on the TTS service instance via the `add_text_transformer()` method or during initialization using the `text_transforms` parameter.

<Note>
The intentions of text transforms are meant to be TTS-specific modifications that do not affect the underlying LLM text or context. That said, since the context aggregator attempts to base its context on what was actually spoken, for services that support word timestamps, like Cartesia, ElevenLabs, and Rime,these transforms will modify the context as they modify what is spoken.
The intentions of text transforms are meant to be TTS-specific modifications
that do not affect the underlying LLM text or context. That said, since the
context aggregator attempts to base its context on what was actually spoken,
for services that support word timestamps, like Cartesia, ElevenLabs, and
Rime,these transforms will modify the context as they modify what is spoken.
</Note>

```python
Expand Down Expand Up @@ -227,7 +231,11 @@ tts.add_text_transformer(replace_acronyms, "*") # Apply to all text

### Text Filters

<Warning>Text filters are no longer the preferred method for text preprocessing and will be deprecated in future releases. Instead, you should use one of the methods described above.</Warning>
<Warning>
Text filters are no longer the preferred method for text preprocessing and
will be deprecated in future releases. Instead, you should use one of the
methods described above.
</Warning>

Apply preprocessing to text before synthesis:

Expand Down
4 changes: 3 additions & 1 deletion guides/learn/transports.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,9 @@ params = TransportParams(
</Note>

<Tip>
For advanced turn detection (like Smart Turn), configure [User Turn Strategies](/server/utilities/user-turn-strategies) on the context aggregator instead of using the transport's turn_analyzer parameter.
For advanced turn detection (like Smart Turn), configure [User Turn
Strategies](/server/utilities/user-turn-strategies) on the context aggregator
instead of using the transport's turn_analyzer parameter.
</Tip>

<Card
Expand Down
8 changes: 5 additions & 3 deletions server/pipeline/pipeline-params.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,12 @@ task = PipelineTask(pipeline, params=params)

<ParamField path="allow_interruptions" type="bool" default="False">
<Warning>
DEPRECATED: This parameter is deprecated. Configure interruption behavior via [User Turn Strategies](/server/utilities/user-turn-strategies) instead. See the `enable_interruptions` parameter on start strategies.
DEPRECATED: This parameter is deprecated. Configure interruption behavior
via [User Turn Strategies](/server/utilities/user-turn-strategies) instead.
See the `enable_interruptions` parameter on start strategies.
</Warning>
Whether to allow pipeline interruptions. When enabled, a user's speech will
immediately interrupt the bot's response.
Whether to allow pipeline interruptions. When enabled, a user's speech will immediately
interrupt the bot's response.
</ParamField>

<ParamField path="audio_in_sample_rate" type="int" default="16000">
Expand Down
18 changes: 13 additions & 5 deletions server/pipeline/pipeline-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,19 +80,27 @@ await runner.run(task)
</ParamField>

<ParamField path="enable_tracing" type="bool" default="False">
Whether to enable OpenTelemetry tracing. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
Whether to enable OpenTelemetry tracing. See [The OpenTelemetry
guide](/server/utilities/opentelemetry) for details.
</ParamField>

<ParamField path="enable_turn_tracking" type="bool" default="False">
Whether to enable turn tracking. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
Whether to enable turn tracking. See [The OpenTelemetry
guide](/server/utilities/opentelemetry) for details.
</ParamField>

<ParamField path="conversation_id" type="Optional[str]" default="None">
Custom ID for the conversation. If not provided, a UUID will be generated. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
Custom ID for the conversation. If not provided, a UUID will be generated. See
[The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
</ParamField>

<ParamField path="additional_span_attributes" type="Optional[dict]" default="None">
Any additional attributes to add to top-level OpenTelemetry conversation span. See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
<ParamField
path="additional_span_attributes"
type="Optional[dict]"
default="None"
>
Any additional attributes to add to top-level OpenTelemetry conversation span.
See [The OpenTelemetry guide](/server/utilities/opentelemetry) for details.
</ParamField>

## Methods
Expand Down
17 changes: 13 additions & 4 deletions server/utilities/external-turn-management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ description: "Handle turn detection externally using UserTurnProcessor or extern
In some scenarios, turn detection happens externally, either through a dedicated processor or an external service. Pipecat provides `ExternalUserTurnStrategies`, a [user turn strategy](/server/utilities/user-turn-strategies) that defers turn handling to these external sources.

External turn management might be needed when:

- **Multiple context aggregators**: Parallel pipelines with multiple LLMs need a single, shared source of turn events
- **External services with turn detection**: Services like [Deepgram Flux](/server/services/stt/deepgram) or [Speechmatics](/server/services/stt/speechmatics) provide their own turn detection

Expand Down Expand Up @@ -38,17 +39,25 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
`UserTurnProcessor` is a frame processor for managing user turn lifecycle when you need a single source of turn events shared across multiple context aggregators. It emits `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames and handles interruptions.

<Note>
`UserTurnProcessor` only manages user turn start and end events. It does not handle transcription aggregation, that remains the responsibility of the context aggregators.
`UserTurnProcessor` only manages user turn start and end events. It does not
handle transcription aggregation, that remains the responsibility of the
context aggregators.
</Note>

### Constructor Parameters

<ParamField path="user_turn_strategies" type="UserTurnStrategies" default="UserTurnStrategies()">
Configured strategies for starting and stopping user turns. See [User Turn Strategies](/server/utilities/user-turn-strategies) for available options.
<ParamField
path="user_turn_strategies"
type="UserTurnStrategies"
default="UserTurnStrategies()"
>
Configured strategies for starting and stopping user turns. See [User Turn
Strategies](/server/utilities/user-turn-strategies) for available options.
</ParamField>

<ParamField path="user_turn_stop_timeout" type="float" default="5.0">
Timeout in seconds to automatically stop a user turn if no stop strategy triggers.
Timeout in seconds to automatically stop a user turn if no stop strategy
triggers.
</ParamField>

### Event Handlers
Expand Down
5 changes: 4 additions & 1 deletion server/utilities/interruption-strategies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ title: "Interruption Strategies"
description: "Configure when users can interrupt the bot to prevent unwanted interruptions from brief affirmations"
---

<Warning>DEPRECATED Interruption strategies have been deprecated in favor of [User Turn Strategies](/server/utilities/user-turn-strategies).</Warning>
<Warning>
DEPRECATED Interruption strategies have been deprecated in favor of [User Turn
Strategies](/server/utilities/user-turn-strategies).
</Warning>

## Overview

Expand Down
7 changes: 6 additions & 1 deletion server/utilities/smart-turn/fal-smart-turn.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ title: "Fal Smart Turn"
description: "Cloud-hosted Smart Turn detection using Fal.ai"
---

<Warning> DEPRECATED: `FalSmartTurnAnalyzer` is deprecated. Please use [LocalSmartTurnAnalyzerV3](/server/utilities/smart-turn/smart-turn-overview#local-smart-turn) instead, which provides fast CPU inference without requiring external API calls. </Warning>
<Warning>
DEPRECATED: `FalSmartTurnAnalyzer` is deprecated. Please use
[LocalSmartTurnAnalyzerV3](/server/utilities/smart-turn/smart-turn-overview#local-smart-turn)
instead, which provides fast CPU inference without requiring external API
calls.
</Warning>

## Overview

Expand Down
8 changes: 3 additions & 5 deletions server/utilities/smart-turn/smart-turn-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,9 @@ The `LocalSmartTurnAnalyzerV3` runs inference locally. Version 3 of the model su
Path to the Smart Turn v3 ONNX file containing the model weights. Download this from
https://huggingface.co/pipecat-ai/smart-turn-v3/tree/main

This parameter is optional, as Pipecat includes a copy of the model internally, and this
is used if the path is unset.
This parameter is optional, as Pipecat includes a copy of the model internally, and this
is used if the path is unset.

</ParamField>

<ParamField path="sample_rate" type="Optional[int]" default="None">
Expand Down Expand Up @@ -169,15 +170,12 @@ user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
)
```


## How It Works

Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points:

1. **Audio Buffering**: The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio.

2. **VAD Processing**: Voice Activity Detection (using the Silero model) detects when there is a pause in the user's speech.

3. **Smart Turn Analysis**: When VAD detects a pause in speech, the Smart Turn model analyzes the audio from the most recent 8 seconds of the user's turn, and makes a decision about whether the turn is complete or incomplete.

The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than `stop_secs`, the turn is automatically marked as complete.
Expand Down
7 changes: 6 additions & 1 deletion server/utilities/transcript-processor.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ title: "TranscriptProcessor"
description: "Factory for creating and managing conversation transcript processors with shared event handling"
---

<Warning> DEPRECATED: TranscriptProcessor has been deprecated. Use `on_user_turn_stopped` and `on_assistant_turn_stopped` events on the context aggregators to collect transcriptions, see [Transcriptions](/server/utilities/transcriptions) for details. </Warning>
<Warning>
DEPRECATED: TranscriptProcessor has been deprecated. Use
`on_user_turn_stopped` and `on_assistant_turn_stopped` events on the context
aggregators to collect transcriptions, see
[Transcriptions](/server/utilities/transcriptions) for details.
</Warning>

## Overview

Expand Down
1 change: 1 addition & 0 deletions server/utilities/transcriptions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ description: "Collect user and assistant conversation transcripts using turn eve
Pipecat provides a straightforward way to collect conversation transcriptions using [turn events](/server/utilities/turn-events). When a user or assistant turn ends, the corresponding event includes the complete transcript for that turn.

The key events for transcription collection are:

- **`on_user_turn_stopped`** - Provides the user's complete transcript via `UserTurnStoppedMessage`
- **`on_assistant_turn_stopped`** - Provides the assistant's complete transcript via `AssistantTurnStoppedMessage`

Expand Down
Loading