[webkit-dev] Should we create an 8-bit path from the network stack to the parser?

Luis de Bethencourt luis at debethencourt.com
Sat Mar 9 12:48:53 PST 2013


On Mar 7, 2013 10:37 PM, "Brady Eidson" <beidson at apple.com> wrote:
>
> > On Thu, Mar 7, 2013 at 2:14 PM, Michael Saboff <msaboff at apple.com>
wrote:
> >> The various tokenizers / lexers work various ways to handle LChar
versus UChar input streams.  Most of the other tokenizers are templatized
on input character type. In the case of HTML, the tokenizer handles a UChar
character at a time.  For 8 bit input streams, the zero extension of a
LChar to a UChar is zero cost.  There may be additional performance to be
gained by doing all other possible handling in 8 bits, but an 8 bit stream
can still contain escapes that need a UChar representation as you point
out.  Using a character type template approach was deemed to be too
unwieldy for the HTML tokenizer.  The HTML tokenizer uses SegmentedString's
that can consist of sub strings with either LChar and UChar.  That is where
the LChar to UChar zero extension happens for an 8 bit sub string.
> >>
> >> My research showed that at the time showed that there were very few
UTF-16 only resources (<<5% IIRC), although I expect the number to grow.
>
> On Mar 7, 2013, at 2:16 PM, Adam Barth <abarth at webkit.org> wrote:
> > Yes, I understand how the HTML tokenizer works.  :)
>
> I didn't understand these details, and I really appreciate Michael
describing them.  I'm also glad others on the mailing list had an
opportunity to get something out of this.
>
> ~Brady

I agree with Brady. I got some interesting learning out of this thread.
Always nice to read explanations and documentation about how things work.
Valuable content.

Luis

>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130309/407ced13/attachment.html>


More information about the webkit-dev mailing list