[webkit-dev] Should we create an 8-bit path from the network stack to the parser?

Adam Barth abarth at webkit.org
Thu Mar 7 11:11:58 PST 2013


The HTMLTokenizer still works in UChars.  There's likely some
performance to be gained by moving it to an 8-bit character type.
There's some trickiness involved because HTML entities can expand to
characters outside of Latin-1. Also, it's unclear if we want two
tokenizers (one that's 8 bits wide and another that's 16 bits wide) or
if we should find a way for the 8-bit tokenizer to handle, for
example, UTF-16 encoded network responses.

Adam


On Thu, Mar 7, 2013 at 10:11 AM, Darin Adler <darin at apple.com> wrote:
> No. I retract my question. Sounds like we already have it right! thanks for setting me straight.
>
> Maybe some day we could make a non copying code path that points directly at the data in the SharedBuffer, but I have no idea if that'd be beneficial.
>
> -- Darin
>
> Sent from my iPhone
>
> On Mar 7, 2013, at 10:01 AM, Michael Saboff <msaboff at apple.com> wrote:
>
>> There is an all-ASCII case in TextCodecUTF8::decode().  It should be keeping all ASCII data as 8 bit.  TextCodecWindowsLatin1::decode() has not only an all-ASCII case, but it only up converts to 16 bit in a couple of rare cases.  Is there some other case you don't think we are handling?
>>
>> - Michael
>>
>> On Mar 7, 2013, at 9:29 AM, Darin Adler <darin at apple.com> wrote:
>>
>>> Hi folks.
>>>
>>> Today, bytes that come in from the network get turned into UTF-16 by the decoding process. We then turn some of them back into Latin-1 during the parsing process. Should we make changes so there’s an 8-bit path? It might be as simple as writing code that has more of an all-ASCII special case in TextCodecUTF8 and something similar in TextCodecWindowsLatin1.
>>>
>>> Is there something significant to be gained here? I’ve been wondering this for a while, so I thought I’d ask the rest of the WebKit contributors.
>>>
>>> -- Darin
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev


More information about the webkit-dev mailing list