[webkit-dev] Writing a new XML parser with no external libraries

Wyatt Carss wcarss at google.com
Tue Jun 28 18:44:06 PDT 2011


If that were all, would it be possible to patch libxml2 to use UTF-16? That
might be less of an undertaking than writing a new xml library, but that
could just be my youthful naivety..

On Tue, Jun 28, 2011 at 6:36 PM, Jeffrey Pfau <jpfau at apple.com> wrote:

> I don't know all of the problems libxml2 has, but one of the ones I've
> heard is that WebCore uses UTF-16 internally, and libxml2 uses UTF-8, so the
> data is perpetually converted between the two formats--and this is slow. If
> there are any other big ones, I haven't been told them, only that it would
> be good to have a replacement.
>
> Jeffrey Pfau
>
> On Jun 28, 2011, at 6:30 PM, Dirk Pranke wrote:
>
> > Can you expand a bit more on "using libxml2 exposes its own share of
> problems"?
> >
> > -- Dirk
> >
> > On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jpfau at apple.com> wrote:
> >> Currently, WebCore uses libxml2, or, if available, QtXml to parse
> incoming XML. However, QtXml isn't always available, and using libxml2
> exposes its own share of problems. As such, I'm undertaking writing an XML
> parser that uses no external libraries.
> >>
> >> The first step to doing this is to add a new flag that switches off the
> other two parsers. As the parsers are already independent and can be
> switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2)
> checks, replacing the #else conditionals, and also a new ENABLE check,
> tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE,
> etc, may be more appropriate).
> >>
> >> As there will probably be a new slew of files pertaining to XML parsing,
> I will put these files in WebCore/xml/parser, and move the existing
> XMLDocumentParser* file into this new directory. As far as I know, the
> placement of these files in WebCore/dom/ is legacy, and, assuming the build
> on each platform is changed, it makes sense to move them.
> >>
> >> Once all the files are in a logical place, I plan to make a new file for
> a skeleton of the new XMLDocumentParser, at least to get it to link until a
> real one is in place, even if the XML parser at that point is just a data
> sink.
> >>
> >> From there, I plan to copy and modify a good chunk of the lower level
> HTML tokenization and parsing code, and make changes as necessary to make it
> work on generalized XML, at least until I can generalize the common code in
> such a way that the HTML and XML tokenizers can be subclasses and use common
> code. I'd probably do the refactoring at the end.
> >>
> >> I'm still exploring the existing parsing code, but I'd probably work my
> way up from there. I've read a lot of the XML 1.0 spec in preparation, as
> well, but it doesn't have much on implementation itself. If QtWebKit or
> parsing people have any comments, concerns, or help, I'd be more than
> willing to listen--I'm just starting here, and I'm not completely familiar
> with the codebase.
> >>
> >> Although no code is checked in so far, I've started on this list already
> and have gotten as far as the new flags, a skeleton
> XMLDocumentParserNew.cpp, and making a tokenizer that compiles and links,
> but is completely untested.
> >>
> >> Jeffrey Pfau
> >> _______________________________________________
> >> webkit-dev mailing list
> >> webkit-dev at lists.webkit.org
> >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
> >>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110628/40d6b700/attachment.html>


More information about the webkit-dev mailing list