[webkit-dev] Writing a new XML parser with no external libraries

Adam Barth abarth at webkit.org
Tue Jun 28 19:37:53 PDT 2011


In case you're not aware, I believe you can access the XML parser via
JavaScript at window.DOMParser, which might be helpful for testing.

Adam
 On Jun 28, 2011 6:41 PM, "Jeffrey Pfau" <jpfau at apple.com> wrote:
> See responses inline:
>
> On Jun 28, 2011, at 6:26 PM, Adam Barth wrote:
>
>> A question and a comment:
>>
>> 1) Will this let us to remove the code for both the libxml2 and the
>> QtXml parsers? I'd certainly much rather have one XML parser than
>> three.
>
> This won't replace libxslt or QtXmlPatterns for XSL-T, as they depend on
the respective XML libraries. The goal for this XML parser is to be able to
replace the core XML parser itself. XSL-T support would have to come later.
>
>> 2) One thing we found very helpful in working on the HTML parser was a
>> good test suite. Presumably there are existing XML parsing test
>> suites. You might consider landing one (or more) of these test suites
>> as a first step.
>>
>> Adam
>
> I know that W3C provides a test suite, but it's probably not that
comprehensive. I can try to find more online; I'm sure that some of the open
source projects like libxml2 provide some.
>
> Jeffrey Pfau
>
>>
>> On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jpfau at apple.com> wrote:
>>> Currently, WebCore uses libxml2, or, if available, QtXml to parse
incoming XML. However, QtXml isn't always available, and using libxml2
exposes its own share of problems. As such, I'm undertaking writing an XML
parser that uses no external libraries.
>>>
>>> The first step to doing this is to add a new flag that switches off the
other two parsers. As the parsers are already independent and can be
switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2)
checks, replacing the #else conditionals, and also a new ENABLE check,
tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE,
etc, may be more appropriate).
>>>
>>> As there will probably be a new slew of files pertaining to XML parsing,
I will put these files in WebCore/xml/parser, and move the existing
XMLDocumentParser* file into this new directory. As far as I know, the
placement of these files in WebCore/dom/ is legacy, and, assuming the build
on each platform is changed, it makes sense to move them.
>>>
>>> Once all the files are in a logical place, I plan to make a new file for
a skeleton of the new XMLDocumentParser, at least to get it to link until a
real one is in place, even if the XML parser at that point is just a data
sink.
>>>
>>> From there, I plan to copy and modify a good chunk of the lower level
HTML tokenization and parsing code, and make changes as necessary to make it
work on generalized XML, at least until I can generalize the common code in
such a way that the HTML and XML tokenizers can be subclasses and use common
code. I'd probably do the refactoring at the end.
>>>
>>> I'm still exploring the existing parsing code, but I'd probably work my
way up from there. I've read a lot of the XML 1.0 spec in preparation, as
well, but it doesn't have much on implementation itself. If QtWebKit or
parsing people have any comments, concerns, or help, I'd be more than
willing to listen--I'm just starting here, and I'm not completely familiar
with the codebase.
>>>
>>> Although no code is checked in so far, I've started on this list already
and have gotten as far as the new flags, a skeleton
XMLDocumentParserNew.cpp, and making a tokenizer that compiles and links,
but is completely untested.
>>>
>>> Jeffrey Pfau
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110628/bbb2bbb6/attachment-0001.html>


More information about the webkit-dev mailing list