[webkit-dev] Writing a new XML parser with no external libraries

Evan Martin evan at chromium.org
Wed Jun 29 08:55:42 PDT 2011


On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jpfau at apple.com> wrote:
> Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming XML. However, QtXml isn't always available, and using libxml2 exposes its own share of problems. As such, I'm undertaking writing an XML parser that uses no external libraries.
>
> The first step to doing this is to add a new flag that switches off the other two parsers. As the parsers are already independent and can be switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) checks, replacing the #else conditionals, and also a new ENABLE check, tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, etc, may be more appropriate).

I have also daydreamed about undertaking such a project, as libxml is
generally kind of terrifying to me.  (At one point I found we had
somehow compiled its table of HTML tags from its half-hearted HTML
processor inside a Chrome binary, sigh.)  Aside from XSLT (which
others have already mentioned), the other problem that kept me from
this is that any reasonably large project that sits atop WebCore
likely already needs to parse XML already, which means that a library
like libxml is already a dependency for other reasons.  For example, I
am pretty sure in Chrome's case using your new XML library would only
serve to double our XML parser code weight.

Here are some places where Chrome uses XML outside of processing web pages:
http://codesearch.google.com/codesearch#search/&exact_package=chromium&q=libxml%20-file:third_party%20file:cc&type=cs

It would be nice if you could construct your library such that it
either wasn't buried within the guts of WebCore or such that it could
be used independently of WebCore.  But I wouldn't be surprised if
either of those aren't really important goals for you; I'm also not
sure there's much public demand for a UTF-16-only xml parser outside
of WebKit internals.


PS: I wrote a much simpler and setjmp-free PNG decoder library with
the intent of integrating it into Chrome, then went through more or
less the same thought process as above and shelved it.


More information about the webkit-dev mailing list