[webkit-dev] Writing a new XML parser with no external libraries

Wed Jun 29 07:50:32 PDT 2011

On Wed, Jun 29, 2011 at 7:18 AM,  <paroga at paroga.com> wrote:
> On Wed, 29 Jun 2011 06:55:57 -0700, Alex Milowski <alex at milowski.org>
> wrote:
>> I know the parser's speed is terrible as I've measured it recently.
>> This is partially due to some of the things we are doing to deal with
>> Unicode decoding to work around libxml2 issues.  I think moving to
>> native strings and decoding would improve the speed by a huge amount.
>> It would be well work it to some to fix this.
>
> With the same UTF-8 content the libxml2 parser is _faster_ than our HTML
> parser:
> https://bugs.webkit.org/show_bug.cgi?id=52036#c1
>
> I don't think that there is a huge difference between the HTML and XML
> parser, so comparing should be ok in this case.
>
> After my (simple) performance tests I still think that parsing UTF-8 is
> better than UTF-16, since it usually has only half of the memory size.
>

I should test your patch against the speed tests I used.  I'll try to
get to that soon.

It is unclear to me how this relates to the original reasons why we
decode, recode, and then decode due to issues with libxml2.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics