Introduce start session algorithm (#138)

beaufortfrancois · web-flow · commit 6356249e3d93 · 2025-02-18T07:41:43.000-08:00
* Introduce start session algorithm * Nits * Nit #2 * Use InvalidStateError instead of UnknownError * Fix typo
diff --git a/index.bs b/index.bs
@@ -99,7 +99,7 @@ This does not preclude adding support for this as a future API enhancement, and
   User consent can include, for example:
   <ul>
     <li>User click on a visible speech input element which has an obvious graphical representation showing that it will start speech input.</li>
-    <li>Accepting a permission prompt shown as the result of a call to <code>SpeechRecognition.start</code>.</li>
+    <li>Accepting a permission prompt shown as the result of a call to <a method for=SpeechRecognition>start()</a>.</li>
     <li>Consent previously granted to always allow speech input for this web page.</li>
   </ul>
   </li>
@@ -142,6 +142,14 @@ This does not preclude adding support for this as a future API enhancement, and
 The term "final result" indicates a SpeechRecognitionResult in which the final attribute is true.
 The term "interim result" indicates a SpeechRecognitionResult in which the final attribute is false.
 
+{{SpeechRecognition}} has the following internal slots:
+
+<dl dfn-type=attribute dfn-for="SpeechRecognition">
+    : <dfn>[[started]]</dfn>
+    ::
+        A boolean flag representing whether the speech recognition started. The initial value is <code>false</code>.
+</dl>
+
 <xmp class="idl">
 [Exposed=Window]
 interface SpeechRecognition : EventTarget {
@@ -277,15 +285,19 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
 
 <dl>
   <dt><dfn method for=SpeechRecognition>start()</dfn> method</dt>
-  <dd>When the start method is called it represents the moment in time the web application wishes to begin recognition.
-  When the speech input is streaming live through the input media stream, then this start call represents the moment in time that the service must begin to listen.
-  Once the system is successfully listening to the recognition the user agent must raise a start event.
-  If the start method is called on an already started object (that is, start has previously been called, and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired on the object), the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.</dd>
+  <dd>
+    1. Let <var>requestMicrophonePermission</var> to <code>true</code>.
+    1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
+  </dd>
 
   <dt><dfn method for=SpeechRecognition>start({{MediaStreamTrack}} audioTrack)</dfn> method</dt>
-  <dd>The overloaded start method does the same thing as the parameterless start method except it performs speech recognition on provided {{MediaStreamTrack}} instead of the input media stream.
-  If the {{MediaStreamTrack/kind}} attribute of the {{MediaStreamTrack}} is not "audio" or the {{MediaStreamTrack/readyState}} attribute is not "live", the user agent must throw an "{{InvalidStateError!!exception}}" {{DOMException}} and ignore the call.
-  Unlike the parameterless start method, the user agent does not check whether [=this=]'s [=relevant global object=]'s [=associated Document=] is [=allowed to use=] the [=policy-controlled feature=] named "<code>microphone</code>".</dd>
+  <dd>
+    1. Let <var>audioTrack</var> be the first argument.
+    1. If <var>audioTrack</var>'s {{MediaStreamTrack/kind}} attribute is NOT <code>"audio"</code>, throw an {{InvalidStateError}} and abort these steps.
+    1. If <var>audioTrack</var>'s {{MediaStreamTrack/readyState}} attribute is NOT <code>"live"</code>, throw an {{InvalidStateError}} and abort these steps.
+    1. Let <var>requestMicrophonePermission</var> be <code>false</code>.
+    1. Run the <a>start session algorithm</a> with <var>requestMicrophonePermission</var>.
+  </dd>
 
   <dt><dfn method for=SpeechRecognition>stop()</dfn> method</dt>
   <dd>The stop method represents an instruction to the recognition service to stop listening to more audio, and to try and return a result using just the audio that it has already received for this recognition.
@@ -309,6 +321,16 @@ See <a href="https://lists.w3.org/Archives/Public/public-speech-api/2012Sep/0072
 
 </dl>
 
+<p>When the <dfn>start session algorithm</dfn> with <var>requestMicrophonePermission</var> is invoked, the user agent MUST run the following steps:
+
+1. If the [=current settings object=]'s [=relevant global object=]'s [=associated Document=] is NOT [=fully active=], throw an {{InvalidStateError}} and abort these steps.
+1. If {{[[started]]}} is <code>true</code> and no <a event for=SpeechRecognition>error</a> or <a event for=SpeechRecognition>end</a> event has fired, throw an {{InvalidStateError}} and abort these steps.
+1. Set {{[[started]]}} to <code>true</code>.
+1. If <var>requestMicrophonePermission</var> is <code>true</code> and [=request permission to use=] "<code>microphone</code>" is [=permission/"denied"=], abort these steps.
+1. Once the system is successfully listening to the recognition, [=fire an event=] named <a event for=SpeechRecognition>start</a> at [=this=].
+
+</p>
+
 <h4 id="speechreco-events">SpeechRecognition Events</h4>
 
 <p>The DOM Level 2 Event Model is used for speech recognition events.