[webkit-dev] Writing a new XML parser with no external libraries

TAMURA, Kent tkent at chromium.org
Tue Jun 28 20:42:30 PDT 2011


I'm a little negative of developing a new XML parser. I'm afraid that the
new parser introduces a lot of security/stability problems which existing
parsers already resolved.

How about importing Expat parser to WebKit repository and maintain it by
ourselves?


On Wed, Jun 29, 2011 at 10:12, Jeffrey Pfau <jpfau at apple.com> wrote:

> Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming
> XML. However, QtXml isn't always available, and using libxml2 exposes its
> own share of problems. As such, I'm undertaking writing an XML parser that
> uses no external libraries.

> The first step to doing this is to add a new flag that switches off the
> other two parsers. As the parsers are already independent and can be
> switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2)
> checks, replacing the #else conditionals, and also a new ENABLE check,
> tentatively called NEW_XML (although names such as NATIVE_XML or  
> XML_NATIVE,
> etc, may be more appropriate).

> As there will probably be a new slew of files pertaining to XML parsing, I
> will put these files in WebCore/xml/parser, and move the existing
> XMLDocumentParser* file into this new directory. As far as I know, the
> placement of these files in WebCore/dom/ is legacy, and, assuming the  
> build
> on each platform is changed, it makes sense to move them.

> Once all the files are in a logical place, I plan to make a new file for a
> skeleton of the new XMLDocumentParser, at least to get it to link until a
> real one is in place, even if the XML parser at that point is just a data
> sink.

>  From there, I plan to copy and modify a good chunk of the lower level HTML
> tokenization and parsing code, and make changes as necessary to make it  
> work
> on generalized XML, at least until I can generalize the common code in  
> such
> a way that the HTML and XML tokenizers can be subclasses and use common
> code. I'd probably do the refactoring at the end.

> I'm still exploring the existing parsing code, but I'd probably work my  
> way
> up from there. I've read a lot of the XML 1.0 spec in preparation, as  
> well,
> but it doesn't have much on implementation itself. If QtWebKit or parsing
> people have any comments, concerns, or help, I'd be more than willing to
> listen--I'm just starting here, and I'm not completely familiar with the
> codebase.

> Although no code is checked in so far, I've started on this list already
> and have gotten as far as the new flags, a skeleton
> XMLDocumentParserNew.cpp, and making a tokenizer that compiles and links,
> but is completely untested.

> Jeffrey Pfau
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev




-- 
TAMURA Kent
Software Engineer, Google




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110629/d577085a/attachment.html>


More information about the webkit-dev mailing list