@@ -151,6 +151,18 @@ The term "interim result" indicates a SpeechRecognitionResult in which the final
151151 A boolean flag representing whether the speech recognition started. The initial value is <code> false</code> .
152152</dl>
153153
154+ <dl dfn-type=attribute dfn-for="SpeechRecognition">
155+ : <dfn>[[mode]]</dfn>
156+ ::
157+ A {{SpeechRecognitionMode}} enum to determine where speech recognition takes place. The initial value is <code> ondevice-preferred</code> .
158+ </dl>
159+
160+ <dl dfn-type=attribute dfn-for="SpeechRecognition">
161+ : <dfn>[[phrases]]</dfn>
162+ ::
163+ A {{SpeechRecognitionPhraseList}} representing a list of phrases for contextual biasing. The initial value is null.
164+ </dl>
165+
154166<xmp class="idl">
155167[Exposed=Window]
156168interface SpeechRecognition : EventTarget {
@@ -162,6 +174,7 @@ interface SpeechRecognition : EventTarget {
162174 attribute boolean interimResults;
163175 attribute unsigned long maxAlternatives;
164176 attribute SpeechRecognitionMode mode;
177+ attribute SpeechRecognitionPhraseList phrases;
165178
166179 // methods to drive the speech interaction
167180 undefined start();
@@ -192,7 +205,8 @@ enum SpeechRecognitionErrorCode {
192205 "network",
193206 "not-allowed",
194207 "service-not-allowed",
195- "language-not-supported"
208+ "language-not-supported",
209+ "phrases-not-supported"
196210};
197211
198212enum SpeechRecognitionMode {
@@ -254,12 +268,29 @@ dictionary SpeechRecognitionEventInit : EventInit {
254268 unsigned long resultIndex = 0;
255269 required SpeechRecognitionResultList results;
256270};
271+
272+ // The object representing a phrase for contextual biasing.
273+ [Exposed=Window]
274+ interface SpeechRecognitionPhrase {
275+ constructor(DOMString phrase, optional float boost = 1.0);
276+ readonly attribute DOMString phrase;
277+ readonly attribute float boost;
278+ };
279+
280+ // The object representing a list of phrases for contextual biasing.
281+ [Exposed=Window]
282+ interface SpeechRecognitionPhraseList {
283+ constructor(sequence<SpeechRecognitionPhrase> phrases);
284+ readonly attribute unsigned long length;
285+ SpeechRecognitionPhrase item(unsigned long index);
286+ undefined addItem(SpeechRecognitionPhrase item);
287+ undefined removeItem(unsigned long index);
288+ };
257289</xmp>
258290
259291<h4 id="speechreco-attributes">SpeechRecognition Attributes</h4>
260292
261293<dl>
262-
263294 <dt> <dfn attribute for=SpeechRecognition>lang</dfn> attribute</dt>
264295 <dd> This attribute will set the language of the recognition for the request, using a valid BCP 47 language tag. [[!BCP47]]
265296 If unset it remains unset for getting in script, but will default to use the language of the html document root element and associated hierarchy.
@@ -283,7 +314,35 @@ dictionary SpeechRecognitionEventInit : EventInit {
283314 The default value is 1.</dd>
284315
285316 <dt> <dfn attribute for=SpeechRecognition>mode</dfn> attribute</dt>
286- <dd> An enum to determine where speech recognition takes place. The default value is "ondevice-preferred".</dd>
317+ <dd>
318+ This attribute represents where speech recognition takes place.
319+ </dd>
320+ <dd>
321+ The getter steps are to return the value of {{SpeechRecognition/[[mode]]}} .
322+ </dd>
323+ <dd>
324+ The setter steps are:
325+ 1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
326+ and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
327+ throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/phrases-not-supported}}
328+ error code and abort these steps.
329+ 1. Set {{SpeechRecognition/[[mode]]}} to the given value.
330+ </dd>
331+
332+ <dt> <dfn attribute for=SpeechRecognition>phrases</dfn> attribute</dt>
333+ <dd>
334+ This attribute represents a list of phrases for contextual biasing.
335+ </dd>
336+ <dd>
337+ The getter steps are to return the value of {{SpeechRecognition/[[phrases]]}} .
338+ </dd>
339+ <dd>
340+ The setter steps are:
341+ 1. If the {{SpeechRecognitionPhraseList/length}} of the given value is greater than 0 and the system does not support contextual biasing,
342+ throw a {{SpeechRecognitionErrorEvent}} with the {{phrases-not-supported}} error code and abort these steps.
343+ 1. Set {{SpeechRecognition/[[phrases]]}} to the given value.
344+ 1. Send a copy of {{SpeechRecognition/[[phrases]]}} to the system for initializing or updating the phrases for contextual biasing implementation.
345+ </dd>
287346</dl>
288347
289348<p class=issue> The group has discussed whether WebRTC might be used to specify selection of audio sources and remote recognizers.
@@ -479,6 +538,9 @@ For example, some implementations may fire <a event for=SpeechRecognition>audioe
479538
480539 <dt> <dfn enum-value for=SpeechRecognitionErrorCode>"language-not-supported"</dfn> </dt>
481540 <dd> The language was not supported.</dd>
541+
542+ <dt> <dfn enum-value for=SpeechRecognitionErrorCode>"phrases-not-supported"</dfn> </dt>
543+ <dd> The speech recognition model does not support phrases for contextual biasing.</dd>
482544 </dl>
483545 </dd>
484546
@@ -557,6 +619,98 @@ For a non-continuous recognition it will hold only a single value.</p>
557619 Note that when resultIndex equals results.length, no new results are returned, this may occur when the array length decreases to remove one or more interim results.</dd>
558620</dl>
559621
622+ <h4 id="speechreco-phrase">SpeechRecognitionPhrase</h4>
623+
624+ <p> The SpeechRecognitionPhrase object represents a phrase for contextual biasing and has the following internal slots:</p>
625+
626+ <dl dfn-type=attribute dfn-for="SpeechRecognitionPhrase">
627+ : <dfn>[[phrase]]</dfn>
628+ ::
629+ A {{DOMString}} representing the text string to be boosted. The initial value is null.
630+ An empty value is allowed but should be ignored by the speech recognition model.
631+ </dl>
632+
633+ <dl dfn-type=attribute dfn-for="SpeechRecognitionPhrase">
634+ : <dfn>[[boost]]</dfn>
635+ ::
636+ A float representing approximately the natural log of the number of times more likely the website thinks this phrase is
637+ than what the speech recognition model knows.
638+ A valid boost must be a float value inside the range [0.0, 10.0] , with a default value of 1.0 if not specified.
639+ A boost of 0.0 means the phrase is not boosted at all, and a higher boost means the phrase is more likely to appear.
640+ A boost of 10.0 means the phrase is extremely likely to appear and should be rarely set.
641+ </dl>
642+
643+ <dl>
644+ <dt> <dfn constructor for=SpeechRecognitionPhrase>SpeechRecognitionPhrase(|phrase|, |boost|)</dfn> constructor</dt>
645+ <dd>
646+ When this constructor is invoked, run the following steps:
647+ 1. If |boost| is smaller than 0.0 or greater than 10.0, throw a {{SyntaxError}} and abort these steps.
648+ 1. Let |phr| be a new object of type {{SpeechRecognitionPhrase}} .
649+ 1. Set |phr|.{{[[phrase]]}} to be the value of |phrase|.
650+ 1. Set |phr|.{{[[boost]]}} to be the value of |boost|.
651+ 1. Return |phr|.
652+ </dd>
653+
654+ <dt> <dfn attribute for=SpeechRecognitionPhrase>phrase</dfn> attribute</dt>
655+ <dd> This attribute returns the value of {{[[phrase]]}} .</dd>
656+
657+ <dt> <dfn attribute for=SpeechRecognitionPhrase>boost</dfn> attribute</dt>
658+ <dd> This attribute returns the value of {{[[boost]]}} .</dd>
659+ </dl>
660+
661+ <h4 id="speechreco-phraselist">SpeechRecognitionPhraseList</h4>
662+
663+ <p> The SpeechRecognitionPhraseList object holds a list of phrases for contextual biasing and has the following internal slot:</p>
664+
665+ <dl dfn-type=attribute dfn-for="SpeechRecognitionPhraseList">
666+ : <dfn>[[phrases]]</dfn>
667+ ::
668+ A list of {{SpeechRecognitionPhrase}} representing the phrases to be boosted. The initial value is an empty list.
669+ </dl>
670+
671+ <dl>
672+ <dt> <dfn constructor for=SpeechRecognitionPhraseList>SpeechRecognitionPhraseList(|phrases|)</dfn> constructor</dt>
673+ <dd>
674+ When this constructor is invoked, run the following steps:
675+ 1. Let |list| be a new object of type {{SpeechRecognitionPhraseList}} .
676+ 1. Set |list|.{{SpeechRecognitionPhraseList/[[phrases]]}} to be the value of |phrases|.
677+ 1. Return |list|.
678+ </dd>
679+
680+ <dt> <dfn attribute for=SpeechRecognitionPhraseList>length</dfn> attribute</dt>
681+ <dd>
682+ This attribute indicates the number of phrases in the list.
683+ When invoked, return the number of items in {{SpeechRecognitionPhraseList/[[phrases]]}} .
684+ </dd>
685+
686+ <dt> <dfn method for=SpeechRecognitionPhraseList>item(|index|)</dfn> method</dt>
687+ <dd>
688+ This method gets the {{SpeechRecognitionPhrase}} object at the |index| of the list.
689+ When invoked, run the following steps:
690+ 1. If |index| is smaller than 0, or greater than or equal to {{SpeechRecognitionPhraseList/length}} ,
691+ throw a {{RangeError}} and abort these steps.
692+ 1. Return the {{SpeechRecognitionPhrase}} at the |index| of {{SpeechRecognitionPhraseList/[[phrases]]}} .
693+ </dd>
694+
695+ <dt> <dfn method for=SpeechRecognitionPhraseList>addItem(|item|)</dfn> method</dt>
696+ <dd>
697+ This method adds the {{SpeechRecognitionPhrase}} object |item| to the list.
698+ When invoked, add |item| to the end of {{SpeechRecognitionPhraseList/[[phrases]]}} .
699+ The list is allowed to have multiple {{SpeechRecognitionPhrase}} objects with the same {{SpeechRecognitionPhrase/[[phrase]]}} value,
700+ and the speech recognition model should use the last {{SpeechRecognitionPhrase/[[boost]]}} value
701+ for this {{SpeechRecognitionPhrase/[[phrase]]}} in the list.
702+ </dd>
703+
704+ <dt> <dfn method for=SpeechRecognitionPhraseList>removeItem(|index|)</dfn> method</dt>
705+ <dd>
706+ This method removes the {{SpeechRecognitionPhrase}} object at the |index| of the list.
707+ When invoked, run the following steps:
708+ 1. If |index| is smaller than 0, or greater than or equal to {{SpeechRecognitionPhraseList/length}} ,
709+ throw a {{RangeError}} and abort these steps.
710+ 1. Remove the {{SpeechRecognitionPhrase}} object at the |index| of {{SpeechRecognitionPhraseList/[[phrases]]}} .
711+ </dd>
712+ </dl>
713+
560714<h3 id="tts-section">The SpeechSynthesis Interface</h3>
561715
562716<p> The SpeechSynthesis interface is the scripted web API for controlling a text-to-speech output.</p>
0 commit comments