[webkit-dev] Implementing the Speech JavaScript API

Thu Mar 1 10:18:29 PST 2012

Hi Hans,

On Thu, Mar 1, 2012 at 4:17 AM, Hans Wennborg <hans at chromium.org> wrote:
> Currently, there is some limited support for speech recognition in
> WebKit, by means of the x-webkit-speech attribute to input elements.
> We would like to continue the development of this to allow web apps to
> better utilize the possibilities of speech recognition and
> text-to-speech synthesis.
>
> In December, Google put forward a proposal [0] for a scripting-only
> subset of the API that was defined in the Speech XG Incubator Group
> Final Report [1].

I haven't read the whole report in detail (it's long!), but I noticed
a few things on a brief read-through:

1) The report introduces two new elements, the <reco> and the <tts>
elements.  It's often the case that folks designing features believe
that they need to introduce new HTML elements.  The report says, "The
reco represents a speech input in a user interface," which makes me
wonder why we don't just use <input type="speech">.

2) Similarly, it's unclear to me why we'd need a new <tts> element
rather than just a new media type for the <audio> element.

3) I didn't understand the role the builtin URI scheme plays.  What
happens if, for example, I use builtin URIs in other places that URIs
are allowed, such as <img src=...> or <a href=...> ?  Adding a new URI
scheme is even more expensive than adding new HTML element and should
be done with care.

In any case, this mailing list isn't really the right place to debate
these details.  That's something better done in the standards arena.

> A W3C Community Group is being started to develop the specification
> [2]. Anyone interested in this is welcome to join once the group is
> available. In the meantime, the spec proposal is currently hosted at
> [3].

This proposal looks much better.  The document says it "supports the
majority of use-cases and sample code in the Incubator Group Final
Report."  It's unclear from your email which of these proposals you're
interested in implementing.  If you're planning to implement [3], that
sounds like a good plan.  If you're planning to implementing [1], you
might want to get some more feedback from the broader community,
including the HTML working group.

> We would like to start implementing this behind a compile-time flag
> (ENABLE_SCRIPTED_SPEECH) and vendor prefix. A first patch is uploaded
> to the bug tracker [4].

Please consider implementing this feature as a module:
<https://trac.webkit.org/wiki/Modules>.  Everything in [3] should be
implementable in a module and will minimize the cost of this feature
on the larger project.

Thanks,
Adam

> [0]. http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [1]. http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
> [2]. http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0438.html
> [3]. http://speech-javascript-api-spec.googlecode.com/git/speechapi.html
> [4]. https://bugs.webkit.org/show_bug.cgi?id=80019
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev