Skip to content

Commit b17e26c

Browse files
committed
Merge branch 'dvc-930-acl-2319-add-voice-tts-docu' into 'main'
[ACL-2319,DVC-930,DVC-1018,DVC-1109] Add doc for speech to speech feature See merge request deepl/clapi-track/api-docs-mirror!231
2 parents e8f46bd + 6f13679 commit b17e26c

4 files changed

Lines changed: 678 additions & 208 deletions

File tree

api-reference/openapi.yaml

Lines changed: 185 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ info:
33
title: DeepL API Documentation
44
description: |-
55
The DeepL API provides programmatic access to DeepL’s language AI technology.
6-
6+
77
Note: this OpenAPI spec is embedded into our API documentation and has shortened descriptions.
88
termsOfService: https://www.deepl.com/pro-license
99
contact:
@@ -2329,6 +2329,8 @@ paths:
23292329
required:
23302330
- source_media_content_type
23312331
properties:
2332+
message_format:
2333+
$ref: '#/components/schemas/VoiceMessageFormat'
23322334
source_media_content_type:
23332335
$ref: '#/components/schemas/VoiceSourceMediaContentType'
23342336
source_language:
@@ -2337,8 +2339,12 @@ paths:
23372339
$ref: '#/components/schemas/VoiceSourceLanguageMode'
23382340
target_languages:
23392341
$ref: '#/components/schemas/VoiceTargetLanguages'
2340-
message_format:
2341-
$ref: '#/components/schemas/VoiceMessageFormat'
2342+
target_media_languages:
2343+
$ref: '#/components/schemas/VoiceTargetMediaLanguages'
2344+
target_media_content_type:
2345+
$ref: '#/components/schemas/VoiceTargetMediaContentType'
2346+
target_media_voice:
2347+
$ref: '#/components/schemas/VoiceTargetMediaVoice'
23422348
glossary_id:
23432349
$ref: '#/components/schemas/GlossaryId'
23442350
formality:
@@ -2362,6 +2368,39 @@ paths:
23622368
message_format: 'msgpack'
23632369
glossary_id: 'def3a26b-3e84-45b3-84ae-0c0aaf3525f7'
23642370
formality: 'formal'
2371+
with_tts:
2372+
summary: With translated audio (default format)
2373+
value:
2374+
source_media_content_type: 'audio/ogg;codecs=opus'
2375+
source_language: 'en'
2376+
target_languages: ['de', 'fr', 'es']
2377+
target_media_languages: ['de', 'fr']
2378+
target_media_content_type: 'audio/webm;codecs=opus'
2379+
target_media_voice: 'female'
2380+
with_tts_short_form:
2381+
summary: With translated audio using short-form MIME types
2382+
value:
2383+
source_media_content_type: 'audio/webm'
2384+
source_language: 'en'
2385+
target_languages: ['de', 'es']
2386+
target_media_languages: ['de', 'es']
2387+
target_media_content_type: 'audio/ogg'
2388+
with_tts_high_quality:
2389+
summary: With translated audio using high-quality PCM
2390+
value:
2391+
source_media_content_type: 'audio/pcm;encoding=s16le;rate=16000'
2392+
source_language: 'en'
2393+
target_languages: ['de']
2394+
target_media_languages: ['de']
2395+
target_media_content_type: 'audio/pcm;encoding=s16le;rate=24000'
2396+
with_tts_raw_opus:
2397+
summary: With translated audio using raw Opus
2398+
value:
2399+
source_media_content_type: 'audio/pcm;encoding=s16le;rate=16000'
2400+
source_language: 'en'
2401+
target_languages: ['de']
2402+
target_media_languages: ['de']
2403+
target_media_content_type: 'audio/opus'
23652404
responses:
23662405
200:
23672406
description: Successfully obtained streaming URL and token.
@@ -4697,12 +4736,21 @@ components:
46974736
Message encoding format for WebSocket communication. Determines how messages are serialized and transmitted.
46984737
Using `json`, messages are JSON-encoded and sent as TEXT WebSocket frames. All binary fields (such as audio data) are base64-encoded strings.
46994738
Using `msgpack`, messages are MessagePack-encoded and sent as BINARY WebSocket frames. All binary fields (such as audio data) contain raw binary data.
4739+
4740+
For more details, see [Message Encoding](/api-reference/voice#message-encoding).
47004741
type: string
47014742
enum:
47024743
- json
47034744
- msgpack
47044745
default: json
47054746
example: json
4747+
VoiceTargetMediaVoice:
4748+
description: (EAP) Target audio voice selection for synthesized speech. The default voice is language dependent.
4749+
type: string
4750+
enum:
4751+
- male
4752+
- female
4753+
example: female
47064754
VoiceSourceMediaContentType:
47074755
type: string
47084756
description: "
@@ -4723,10 +4771,10 @@ components:
47234771
| `audio/ogg;codecs=opus` | Ogg (ogg/oga) | OPUS |\n
47244772
| `audio/pcm;encoding=alaw;rate=8000` | - | PCM A-Law 8000 Hz (G.711) |\n
47254773
| `audio/pcm;encoding=ulaw;rate=8000` | - | PCM µ-Law 8000 Hz (G.711) |\n
4726-
| `audio/pcm;encoding=s16le;rate=8000` | - | PCM signed 16-bit little-endian, 8000 Hz |\n
4727-
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian, 16000 Hz |\n
4728-
| `audio/pcm;encoding=s16le;rate=44100` | - | PCM signed 16-bit little-endian, 44100 Hz |\n
4729-
| `audio/pcm;encoding=s16le;rate=48000` | - | PCM signed 16-bit little-endian, 48000 Hz |\n
4774+
| `audio/pcm;encoding=s16le;rate=8000` | - | PCM signed 16-bit little-endian 8000 Hz |\n
4775+
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian 16000 Hz |\n
4776+
| `audio/pcm;encoding=s16le;rate=44100` | - | PCM signed 16-bit little-endian 44100 Hz |\n
4777+
| `audio/pcm;encoding=s16le;rate=48000` | - | PCM signed 16-bit little-endian 48000 Hz |\n
47304778
| `audio/webm;codecs=opus` | WebM (webm) | OPUS |\n
47314779
| `audio/x-matroska;codecs=aac` | Matroska (mkv/mka) | AAC |\n
47324780
| `audio/x-matroska;codecs=flac` | Matroska (mkv/mka) | FLAC |\n
@@ -4768,24 +4816,47 @@ components:
47684816
description: >
47694817
The source language of the audio stream. It can be left empty or must be one of the
47704818
supported Voice API source languages and comply with IETF BCP 47 language tags.
4819+
4820+
Note: Some source transcription languages are provided through external service partners.
4821+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
47714822
enum:
4772-
- de
4823+
- ar
4824+
- bg
4825+
- bn
47734826
- cs
4827+
- da
4828+
- de
4829+
- el
47744830
- en
47754831
- es
4832+
- et
4833+
- fi
47764834
- fr
4835+
- ga
4836+
- he
4837+
- hr
4838+
- hu
47774839
- id
47784840
- it
47794841
- ja
47804842
- ko
4843+
- lt
4844+
- lv
4845+
- mt
4846+
- nb
47814847
- nl
47824848
- pl
47834849
- pt
47844850
- ro
47854851
- ru
4852+
- sk
4853+
- sl
47864854
- sv
4855+
- th
4856+
- tl
47874857
- tr
47884858
- uk
4859+
- vi
47894860
- zh
47904861
default:
47914862
example: en
@@ -4805,11 +4876,13 @@ components:
48054876
description: >
48064877
List of target languages for translation. The stream will emit translations for each language.
48074878
The maximum allowed target languages per stream is 5. Language identifiers must comply with IETF BCP 47.
4879+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
48084880
items:
48094881
type: string
48104882
enum:
48114883
- ar
48124884
- bg
4885+
- bn
48134886
- cs
48144887
- da
48154888
- de
@@ -4821,14 +4894,17 @@ components:
48214894
- et
48224895
- fi
48234896
- fr
4897+
- ga
48244898
- he
4899+
- hr
48254900
- hu
48264901
- id
48274902
- it
48284903
- ja
48294904
- ko
48304905
- lt
48314906
- lv
4907+
- mt
48324908
- nb
48334909
- nl
48344910
- pl
@@ -4841,6 +4917,7 @@ components:
48414917
- sl
48424918
- sv
48434919
- th
4920+
- tl
48444921
- tr
48454922
- uk
48464923
- vi
@@ -4851,6 +4928,106 @@ components:
48514928
maxItems: 5
48524929
default: []
48534930
example: ["de", "fr", "es"]
4931+
VoiceTargetMediaLanguages:
4932+
type: array
4933+
description: >
4934+
(EAP) List of target languages for which to generate synthesized audio.
4935+
Languages specified here will automatically be added to target_languages if not already present,
4936+
ensuring you receive both text translation and audio synthesis for these languages.
4937+
If omitted, only text transcription and translation will be provided (no audio synthesis).
4938+
The maximum allowed target media languages per stream is 5.
4939+
Language identifiers must comply with IETF BCP 47.
4940+
4941+
Note: Some translated audio languages are provided through external service partners.
4942+
See the [supported languages table](/api-reference/voice#show-supported-languages) for details.
4943+
items:
4944+
type: string
4945+
enum:
4946+
- ar
4947+
- bg
4948+
- cs
4949+
- da
4950+
- de
4951+
- el
4952+
- en
4953+
- en-GB
4954+
- en-US
4955+
- es
4956+
- fi
4957+
- fr
4958+
- hu
4959+
- id
4960+
- it
4961+
- ja
4962+
- ko
4963+
- nb
4964+
- nl
4965+
- pl
4966+
- pt
4967+
- pt-BR
4968+
- pt-PT
4969+
- ro
4970+
- ru
4971+
- sk
4972+
- sv
4973+
- tr
4974+
- uk
4975+
- vi
4976+
- zh
4977+
- zh-HANS
4978+
- zh-HANT
4979+
maxItems: 5
4980+
default: []
4981+
example: ["de", "en-GB"]
4982+
VoiceTargetMediaContentType:
4983+
type: string
4984+
description: "
4985+
(EAP) The audio format for synthesized target media streaming.\n
4986+
Specifies container, codec, and encoding parameters for the audio returned in target_media_chunk messages.\n
4987+
If not specified, defaults to audio/webm;codecs=opus.\n
4988+
Only applies when target_media_languages is specified.\n
4989+
\n
4990+
| Content Type | Container | Codec |\n
4991+
| :--- | :--- | :--- |\n
4992+
| `audio/flac` | FLAC (flac) | FLAC 24000 Hz |\n
4993+
| `video/mp2t;codecs=aac` | MPEG Transport Stream (Audio only) | AAC 70 kbit/s |\n
4994+
| `video/mp2t;codecs=opus` | MPEG Transport Stream (Audio only) | OPUS 32 kbit/s |\n
4995+
| `audio/ogg` | Ogg (ogg/oga) | OPUS 32 kbit/s |\n
4996+
| `audio/ogg;codecs=flac` | Ogg (ogg/oga) | FLAC 24000 Hz |\n
4997+
| `audio/ogg;codecs=opus` | Ogg (ogg/oga) | OPUS 32 kbit/s |\n
4998+
| `audio/opus` | - | OPUS 32 kbit/s |\n
4999+
| `audio/pcm;encoding=alaw;rate=8000` | - | PCM A-Law 8000 Hz (G.711) |\n
5000+
| `audio/pcm;encoding=ulaw;rate=8000` | - | PCM µ-Law 8000 Hz (G.711) |\n
5001+
| `audio/pcm;encoding=s16le;rate=16000` | - | PCM signed 16-bit little-endian 16000 Hz |\n
5002+
| `audio/pcm;encoding=s16le;rate=24000` | - | PCM signed 16-bit little-endian 24000 Hz |\n
5003+
| `audio/webm` | WebM (webm) | OPUS 32 kbit/s |\n
5004+
| `audio/webm;codecs=opus` | WebM (webm) | OPUS 32 kbit/s |\n
5005+
| `audio/x-matroska;codecs=aac` | Matroska (mkv/mka) | AAC 70 kbit/s |\n
5006+
| `audio/x-matroska;codecs=flac` | Matroska (mkv/mka) | FLAC 24000 Hz |\n
5007+
| `audio/x-matroska;codecs=opus` | Matroska (mkv/mka) | OPUS 32 kbit/s |\n
5008+
\n
5009+
We recommend the following formats as good tradeoffs between quality and bandwidth:\n
5010+
- OPUS (WebM): 32 kbps, recommended for low bandwidth scenarios (default)\n
5011+
- PCM 24kHz: 384 kbps, high quality"
5012+
enum:
5013+
- audio/flac
5014+
- video/mp2t;codecs=aac
5015+
- video/mp2t;codecs=opus
5016+
- audio/ogg
5017+
- audio/ogg;codecs=flac
5018+
- audio/ogg;codecs=opus
5019+
- audio/opus
5020+
- audio/pcm;encoding=alaw;rate=8000
5021+
- audio/pcm;encoding=ulaw;rate=8000
5022+
- audio/pcm;encoding=s16le;rate=16000
5023+
- audio/pcm;encoding=s16le;rate=24000
5024+
- audio/webm
5025+
- audio/webm;codecs=opus
5026+
- audio/x-matroska;codecs=aac
5027+
- audio/x-matroska;codecs=flac
5028+
- audio/x-matroska;codecs=opus
5029+
default: audio/webm;codecs=opus
5030+
example: audio/webm;codecs=opus
48545031
VoiceStreamingResponse:
48555032
type: object
48565033
required:

0 commit comments

Comments
 (0)