Skip to content

Commit 6f13679

Browse files
committed
[ACL-2319,DVC-930,DVC-1018,DVC-1109] Add doc for speech to speech feature
- incorporate translated speech feature - break up and extend overview page - extend updates to T&C
1 parent 0911035 commit 6f13679

4 files changed

Lines changed: 678 additions & 208 deletions

File tree

api-reference/openapi.yaml

Lines changed: 185 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ info:
33
title: DeepL API Documentation
44
description: |-
55
The DeepL API provides programmatic access to DeepL’s language AI technology.
6-
6+
77
Note: this OpenAPI spec is embedded into our API documentation and has shortened descriptions.
88
termsOfService: https://www.deepl.com/pro-license
99
contact:
@@ -2329,6 +2329,8 @@ paths:
23292329
required:
23302330
- source_media_content_type
23312331
properties:
2332+
message_format:
2333+
$ref: '#/components/schemas/VoiceMessageFormat'
23322334
source_media_content_type:
23332335
$ref: '#/components/schemas/VoiceSourceMediaContentType'
23342336
source_language:
@@ -2337,8 +2339,12 @@ paths:
23372339
$ref: '#/components/schemas/VoiceSourceLanguageMode'
23382340
target_languages:
23392341
$ref: '#/components/schemas/VoiceTargetLanguages'
2340-
message_format:
2341-
$ref: '#/components/schemas/VoiceMessageFormat'
2342+
target_media_languages:
2343+
$ref: '#/components/schemas/VoiceTargetMediaLanguages'
2344+
target_media_content_type:
2345+
$ref: '#/components/schemas/VoiceTargetMediaContentType'
2346+
target_media_voice:
2347+
$ref: '#/components/schemas/VoiceTargetMediaVoice'
23422348
glossary_id:
23432349
$ref: '#/components/schemas/GlossaryId'
23442350
formality:
@@ -2362,6 +2368,39 @@ paths:
23622368
message_format: 'msgpack'
23632369
glossary_id: 'def3a26b-3e84-45b3-84ae-0c0aaf3525f7'
23642370
formality: 'formal'
2371+
with_tts:
2372+
summary: With translated audio (default format)
2373+
value:
2374+
source_media_content_type: 'audio/ogg;codecs=opus'
2375+
source_language: 'en'
2376+
target_languages: ['de', 'fr', 'es']
2377+
target_media_languages: ['de', 'fr']
2378+
target_media_content_type: 'audio/webm;codecs=opus'
2379+
target_media_voice: 'female'
2380+
with_tts_short_form:
2381+
summary: With translated audio using short-form MIME types
2382+
value:
2383+
source_media_content_type: 'audio/webm'
2384+
source_language: 'en'
2385+
target_languages: ['de', 'es']
2386+
target_media_languages: ['de', 'es']
2387+
target_media_content_type: 'audio/ogg'
2388+
with_tts_high_quality:
2389+
summary: With translated audio using high-quality PCM
2390+
value:
2391+
source_media_content_type: 'audio/pcm;encoding=s16le;rate=16000'
2392+
source_language: 'en'
2393+
target_languages: ['de']
2394+
target_media_languages: ['de']
2395+
target_media_content_type: 'audio/pcm;encoding=s16le;rate=24000'
2396+
with_tts_raw_opus:
2397+
summary: With translated audio using raw Opus
2398+
value:
2399+
source_media_content_type: 'audio/pcm;encoding=s16le;rate=16000'
2400+
source_language: 'en'
2401+
target_languages: ['de']
2402+
target_media_languages: ['de']
2403+
target_media_content_type: 'audio/opus'
23652404
responses:
23662405
200:
23672406
description: Successfully obtained streaming URL and token.
@@ -4741,12 +4780,21 @@ components:
47414780
Message encoding format for WebSocket communication. Determines how messages are serialized and transmitted.
47424781
Using `json`, messages are JSON-encoded and sent as TEXT WebSocket frames. All binary fields (such as audio data) are base64-encoded strings.
47434782
Using `msgpack`, messages are MessagePack-encoded and sent as BINARY WebSocket frames. All binary fields (such as audio data) contain raw binary data.
4783+
4784+
For more details, see [Message Encoding](/api-reference/voice#message-encoding).
47444785
type: string
47454786
enum:
47464787
- json
47474788
- msgpack
47484789
default: json
47494790
example: json
4791+
VoiceTargetMediaVoice:
4792+
description: (EAP) Target audio voice selection for synthesized speech. The default voice is language dependent.
4793+
type: string
4794+
enum:
4795+
- male
4796+
- female
4797+
example: female
47504798
VoiceSourceMediaContentType:
47514799
type: string
47524800
description: "
@@ -4767,10 +4815,10 @@ components:
47674815
| `audio/ogg;codecs=opus` | Ogg (ogg/oga) | OPUS |\n
47684816
| `audio/pcm;encoding=alaw;rate=8000` | - | PCM A-Law 8000 Hz (G.711) |\n
47694817
| `audio/pcm;encoding=ulaw;rate=8000` | - | PCM µ-Law 8000 Hz (G.711) |\n
4770-
| `audio/pcm;encoding=s16le;rate=8000` | - | PCM signed 16-bit little-endian, 8000 Hz |\n
4771-
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian, 16000 Hz |\n
4772-
| `audio/pcm;encoding=s16le;rate=44100` | - | PCM signed 16-bit little-endian, 44100 Hz |\n
4773-
| `audio/pcm;encoding=s16le;rate=48000` | - | PCM signed 16-bit little-endian, 48000 Hz |\n
4818+
| `audio/pcm;encoding=s16le;rate=8000` | - | PCM signed 16-bit little-endian 8000 Hz |\n
4819+
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian 16000 Hz |\n
4820+
| `audio/pcm;encoding=s16le;rate=44100` | - | PCM signed 16-bit little-endian 44100 Hz |\n
4821+
| `audio/pcm;encoding=s16le;rate=48000` | - | PCM signed 16-bit little-endian 48000 Hz |\n
47744822
| `audio/webm;codecs=opus` | WebM (webm) | OPUS |\n
47754823
| `audio/x-matroska;codecs=aac` | Matroska (mkv/mka) | AAC |\n
47764824
| `audio/x-matroska;codecs=flac` | Matroska (mkv/mka) | FLAC |\n
@@ -4812,24 +4860,47 @@ components:
48124860
description: >
48134861
The source language of the audio stream. It can be left empty or must be one of the
48144862
supported Voice API source languages and comply with IETF BCP 47 language tags.
4863+
4864+
Note: Some source transcription languages are provided through external service partners.
4865+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
48154866
enum:
4816-
- de
4867+
- ar
4868+
- bg
4869+
- bn
48174870
- cs
4871+
- da
4872+
- de
4873+
- el
48184874
- en
48194875
- es
4876+
- et
4877+
- fi
48204878
- fr
4879+
- ga
4880+
- he
4881+
- hr
4882+
- hu
48214883
- id
48224884
- it
48234885
- ja
48244886
- ko
4887+
- lt
4888+
- lv
4889+
- mt
4890+
- nb
48254891
- nl
48264892
- pl
48274893
- pt
48284894
- ro
48294895
- ru
4896+
- sk
4897+
- sl
48304898
- sv
4899+
- th
4900+
- tl
48314901
- tr
48324902
- uk
4903+
- vi
48334904
- zh
48344905
default:
48354906
example: en
@@ -4849,11 +4920,13 @@ components:
48494920
description: >
48504921
List of target languages for translation. The stream will emit translations for each language.
48514922
The maximum allowed target languages per stream is 5. Language identifiers must comply with IETF BCP 47.
4923+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
48524924
items:
48534925
type: string
48544926
enum:
48554927
- ar
48564928
- bg
4929+
- bn
48574930
- cs
48584931
- da
48594932
- de
@@ -4865,14 +4938,17 @@ components:
48654938
- et
48664939
- fi
48674940
- fr
4941+
- ga
48684942
- he
4943+
- hr
48694944
- hu
48704945
- id
48714946
- it
48724947
- ja
48734948
- ko
48744949
- lt
48754950
- lv
4951+
- mt
48764952
- nb
48774953
- nl
48784954
- pl
@@ -4885,6 +4961,7 @@ components:
48854961
- sl
48864962
- sv
48874963
- th
4964+
- tl
48884965
- tr
48894966
- uk
48904967
- vi
@@ -4895,6 +4972,106 @@ components:
48954972
maxItems: 5
48964973
default: []
48974974
example: ["de", "fr", "es"]
4975+
VoiceTargetMediaLanguages:
4976+
type: array
4977+
description: >
4978+
(EAP) List of target languages for which to generate synthesized audio.
4979+
Languages specified here will automatically be added to target_languages if not already present,
4980+
ensuring you receive both text translation and audio synthesis for these languages.
4981+
If omitted, only text transcription and translation will be provided (no audio synthesis).
4982+
The maximum allowed target media languages per stream is 5.
4983+
Language identifiers must comply with IETF BCP 47.
4984+
4985+
Note: Some translated audio languages are provided through external service partners.
4986+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
4987+
items:
4988+
type: string
4989+
enum:
4990+
- ar
4991+
- bg
4992+
- cs
4993+
- da
4994+
- de
4995+
- el
4996+
- en
4997+
- en-GB
4998+
- en-US
4999+
- es
5000+
- fi
5001+
- fr
5002+
- hu
5003+
- id
5004+
- it
5005+
- ja
5006+
- ko
5007+
- nb
5008+
- nl
5009+
- pl
5010+
- pt
5011+
- pt-BR
5012+
- pt-PT
5013+
- ro
5014+
- ru
5015+
- sk
5016+
- sv
5017+
- tr
5018+
- uk
5019+
- vi
5020+
- zh
5021+
- zh-HANS
5022+
- zh-HANT
5023+
maxItems: 5
5024+
default: []
5025+
example: ["de", "en-GB"]
5026+
VoiceTargetMediaContentType:
5027+
type: string
5028+
description: "
5029+
(EAP) The audio format for synthesized target media streaming.\n
5030+
Specifies container, codec, and encoding parameters for the audio returned in target_media_chunk messages.\n
5031+
If not specified, defaults to audio/webm;codecs=opus.\n
5032+
Only applies when target_media_languages is specified.\n
5033+
\n
5034+
| Content Type | Container | Codec |\n
5035+
| :--- | :--- | :--- |\n
5036+
| `audio/flac` | FLAC (flac) | FLAC 24000 Hz |\n
5037+
| `video/mp2t;codecs=aac` | MPEG Transport Stream (Audio only) | AAC 70 kbit/s |\n
5038+
| `video/mp2t;codecs=opus` | MPEG Transport Stream (Audio only) | OPUS 32 kbit/s |\n
5039+
| `audio/ogg` | Ogg (ogg/oga) | OPUS 32 kbit/s |\n
5040+
| `audio/ogg;codecs=flac` | Ogg (ogg/oga) | FLAC 24000 Hz |\n
5041+
| `audio/ogg;codecs=opus` | Ogg (ogg/oga) | OPUS 32 kbit/s |\n
5042+
| `audio/opus` | - | OPUS 32 kbit/s |\n
5043+
| `audio/pcm;encoding=alaw;rate=8000` | - | PCM A-Law 8000 Hz (G.711) |\n
5044+
| `audio/pcm;encoding=ulaw;rate=8000` | - | PCM µ-Law 8000 Hz (G.711) |\n
5045+
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian 16000 Hz |\n
5046+
| `audio/pcm;encoding=s16le;rate=24000` | - | PCM signed 16-bit little-endian 24000 Hz |\n
5047+
| `audio/webm` | WebM (webm) | OPUS 32 kbit/s |\n
5048+
| `audio/webm;codecs=opus` | WebM (webm) | OPUS 32 kbit/s |\n
5049+
| `audio/x-matroska;codecs=aac` | Matroska (mkv/mka) | AAC 70 kbit/s |\n
5050+
| `audio/x-matroska;codecs=flac` | Matroska (mkv/mka) | FLAC 24000 Hz |\n
5051+
| `audio/x-matroska;codecs=opus` | Matroska (mkv/mka) | OPUS 32 kbit/s |\n
5052+
\n
5053+
We recommend the following formats as good tradeoffs between quality and bandwidth:\n
5054+
- OPUS (WebM): 32 kbps, recommended for low bandwidth scenarios (default)\n
5055+
- PCM 24kHz: 384 kbps, high quality"
5056+
enum:
5057+
- audio/flac
5058+
- video/mp2t;codecs=aac
5059+
- video/mp2t;codecs=opus
5060+
- audio/ogg
5061+
- audio/ogg;codecs=flac
5062+
- audio/ogg;codecs=opus
5063+
- audio/opus
5064+
- audio/pcm;encoding=alaw;rate=8000
5065+
- audio/pcm;encoding=ulaw;rate=8000
5066+
- audio/pcm;encoding=s16le;rate=16000
5067+
- audio/pcm;encoding=s16le;rate=24000
5068+
- audio/webm
5069+
- audio/webm;codecs=opus
5070+
- audio/x-matroska;codecs=aac
5071+
- audio/x-matroska;codecs=flac
5072+
- audio/x-matroska;codecs=opus
5073+
default: audio/webm;codecs=opus
5074+
example: audio/webm;codecs=opus
48985075
VoiceStreamingResponse:
48995076
type: object
49005077
required:

0 commit comments

Comments
 (0)