[webkit-dev] Should we create an 8-bit path from the network stack to the parser?
Michael Saboff
msaboff at apple.com
Mon Mar 11 08:56:14 PDT 2013
Maciej,
*I* deemed using a character type template for the HTMLTokenizer as being unwieldy. Given there was the existing SegmentedString input abstraction, it made logical sense to put the 8/16 bit coding there. If I would have moved the 8/16 logic into the tokenizer itself, we might have needed to do 8->16 up conversions when a SegmentedStrings had mixed bit-ness in the contained substrings. Even if that wasn't the case, the patch would have been far larger and likely include tricky code for escapes.
As I got into the middle of the 8-bit strings, I realized that not only could I keep performance parity, but some of the techniques I came up with offered good performance improvement. The HTMLTokenizer ended up being one of those cases. This patch required a couple of reworks for performance reasons and garnered a lot of discussion from various parts of the webkit community. See https://bugs.webkit.org/show_bug.cgi?id=90321 for the trail. Ryosuke noted that this patch was responsible for a 24% improvement in the url-parser test in their bots (comment 47). My performance final results are in comment 43 and show between 1 and 9% progression on the various HTML parser tests.
Adam, If you believe there is more work to be done in the HTMLTokenizer, file a bug and cc me. I'm interested in hearing your thoughts.
- Michael
On Mar 9, 2013, at 4:24 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>
> On Mar 9, 2013, at 3:05 PM, Adam Barth <abarth at webkit.org> wrote:
>>
>> In retrospect, I think what I was reacting to was msaboff statement
>> that an unnamed group of people had decided that the HTML tokenizer
>> was too unwieldy to have a dedicated 8-bit path. In particular, it's
>> unclear to me who made that decision. I certainly do not consider the
>> matter decided.
>
> It would be good to find out who it was that said that (or more specifically: "Using a character type template approach was deemed to be too unwieldy for the HTML tokenizer.") so you can talk to them about it.
>
> Michael?
>
> Regards,
> Maciej
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130311/965cf19c/attachment.html>
More information about the webkit-dev
mailing list