[webkit-dev] HTML5 tokenizer landing soon

Mon Jun 14 10:55:27 PDT 2010

On Mon, Jun 14, 2010 at 10:41 AM, Alexey Proskuryakov <ap at webkit.org> wrote:
> 14.06.2010, в 10:21, Adam Barth написал(а):
>> In the new world, the
>> preload scanner is very simple because the tokenization algorithm is
>> separate from the rest of what the old HTMLTokenizer class did (which
>> was a lot).
>
> Will be be able to also switch TextResourceDecoder::checkForHeadCharset()?
> Currently, it implements a custom parser to find <meta> charset, which is
> unfortunate.

Cry.

That should be possible.  The API for the lexer is very simple.  The
preload scanner (without CSS parsing) is only 116 lines (30 of which
are copyright, etc, boilerplate).  The one complication might be that
TextResourceDecoder::checkForHeadCharset seems to operate on the raw
bytestream whereas the lexer operators on a stream of code units.

Adam