[webkit-dev] Writing a new XML parser with no external libraries

Wed Jun 29 08:33:03 PDT 2011

On Wed, Jun 29, 2011 at 6:55 AM, Alex Milowski <alex at milowski.org> wrote:

> On Tue, Jun 28, 2011 at 6:50 PM, Eric Seidel <eric at webkit.org> wrote:
> >
> > I'm in general in favor of this effort (having worked extensively on
> > the existing XML parsers).
> >
> > But I would caution you that xml is a ridiculously tiny fraction of
> > the web.  And it may not be worth the engineering effort to make a
> > better parser.
> >
> > http://www.google.com/search?q=filetype:html = 25,270,000,000
> > http://www.google.com/search?q=filetype:xml = 71,000,000
> >
>
> I can't let this one just pass by! ;)
>
> First, filetype is by extension and not media type [1].  As such, that
> is an incorrect accounting of the amount of XML on the web.  Secondly,
> just using file extensions, you'd have to enumerate and sum all the
> extensions used by all XML media types (e.g. .xhtml, .svg, etc.).
> Third, there is plenty of content on the web that Google does not
> crawl (the "dark web") where there are petabytes of XML waiting for
> browsers to do something with it (e.g. astronomical data cone search
> services).
>

+1.  Also, a lot of .asp, .php, etc... files serve XHTML contents.

- Ryosuke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110629/ab2a50fe/attachment.html>