[Webkit-unassigned] [Bug 42367] Speech input plumbing in webkit

Wed Jul 21 09:39:36 PDT 2010

https://bugs.webkit.org/show_bug.cgi?id=42367

--- Comment #9 from Darin Fisher (:fishd, Google) <fishd at chromium.org>  2010-07-21 09:39:36 PST ---
(In reply to comment #8)
> (In reply to comment #7)
> 
> Thanks for the comments, some questions/replies below.
> 
> > WebKit/chromium/public/WebViewClient.h:340
> >  +       virtual WebKit::WebSpeechInputClient* speechInputClient() { return 0; }
> > instead of introducing a new interface here, just add the methods
> > from WebSpeechInputClient to WebViewClient.
> > 
> > Then rename WebSpeechInputClientListener to WebSpeechInputListener.
> 
> I see other features such as Geolocation have an interface (WebKit::WebGeolocationService)
> and implemented in the chromium render process as a separate dispatcher-like class which 
> seemed clean. Just double checking with you if we are moving away from that model..

Yes, that's because WebGeolocationService has a lot of methods.  I'm OK creating
a separate interface for speech input if you think it is going to make things
cleaner.  I would just not call it WebSpeechInputClient.  That name suggests
that it is the "client" to a WebSpeechInput interface, which does not appear
to exist.

Instead, I would rename WebSpeechInputClient to WebSpeechInputService.  Then
I would rename WebSpeechInputClientListener to just WebSpeechInputListener.

i.e., the speech input service can have listeners.

> > WebKit/chromium/public/WebSpeechInputClientListener.h:49
> >  +      virtual void setRecognitionResult(const WebString&) { WEBKIT_ASSERT_NOT_REACHED(); }
> > didCompleteRecognition?
> 
> This method can potentially get called multiple times, if there are partial results 
> available as the user keeps speaking. The current name may be more suitable for such
> cases, is it ok to leave it as is?

OK, but please add comments explaining all of this.  I found these interfaces
to be very confusing when I first read them.

> > WebKit/chromium/public/WebSpeechInputClient.h:45
> >  +      virtual bool attachListener(WebSpeechInputClientListener*) { WEBKIT_ASSERT_NOT_REACHED(); }
> > why do we need multiple listeners?
> 
> We don't need, there should be a 1:1 mapping between the client and listener.
> But since the client is fetched from the embedder on creation of WebViewImpl,
> I was not sure how this attach can be done before that. Any suggestions for
> making this better?

You should pass the WebSpeechInputListener to the method that returns
the WebSpeechInputService.  It becomes like a "constructor parameter"
in that case.

> > WebKit/chromium/public/WebSpeechInputClient.h:57
> >  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> > why do we need the recordingComplete (didCompleteRecording) event if
> > there is an explicit stopRecording method?  are there other ways that
> > a recording could stop such that WebKit would require the notification
> > that recording stopped?
> 
> stopRecording() is an optional call, without that the browser's speech recording
> 'endpointer' will detect silence in the input and stop recording automatically
> once the user stops speaking. stopRecording() is there to let users explicitly
> click the speech/mic button again in case they are not familiar with the workings
> or for better feature discoverability/usability.
> recordingComplete() will be issued in both these cases. The speech element in
> WebKit explicitly needs to know when recording stops so it can update the UI and
> indicate that recognition is in progress (which can take a while if done via a
> speech recognition server).

OK, please explain this in the comments.

> > WebKit/chromium/public/WebSpeechInputClient.h:57
> >  +      virtual void stopRecording() { WEBKIT_ASSERT_NOT_REACHED(); }
> > should there be some way to start recording again?
> 
> startRecognition() does that. This method is named stopRecording() and not
> stopRecognition() because the audio recorded so far is still recognized and
> the result returned to the input element.

OK, please explain this in the comments.

> > WebKit/chromium/public/WebSpeechInputClientListener.h:44
> >  +      virtual void recordingComplete() { WEBKIT_ASSERT_NOT_REACHED(); }
> > since these methods are implemented by WebKit, they should be pure virtual.
> 
> I got comments from Marcus (above) to change from pure virtual to this model
> as he felt this was the new hotness, I'm happy to move back to pure virtual
> if you think that is the way to go.

The rule is very simple:  use pure virtual for methods implemented by WebKit,
but provide default implementations of methods implemented by the embedder.
Please share this rule :)

The point here is that we would prefer to always use pure virtual methods
as it allows the compiler to catch cases in which we forget to implement
a method.  However, to make it easier to roll new versions of webkit into
chromium, we provide default implementations for methods that chromium
may need to implement.  that way if the chromium side hasn't yet implemented
the new method, things will still compile (although they might not function
perfectly).

> > what if there are multiple pages independently listening for speech
> > input?  given these interfaces, it seems like one of the pages could
> > cancel the speech recognition process, hampering the efforts of the
> > other page.  or, is that dealt with at a lower level within WebCore?
> 
> In my understanding each WebViewImpl manages only a single page, but each
> page can have multiple speech enabled input elements. The code so far
> handles the multiple input element case by allowing only one input element
> to record at a time. Speech input code in the browser layer should handle
> the multiple page case, which is not done yet.

Makes sense.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.