[webkit-dev] libxml2 "override encoding" support
Alex Milowski
alex at milowski.org
Wed Jan 5 07:26:08 PST 2011
On Wed, Jan 5, 2011 at 5:07 AM, Patrick Gansterer <paroga at paroga.com> wrote:
>
> Is there a reason why we can't pass the "raw" data to libxml2?
> E.g. when the input file is UTF-8 we convert it into UTF-16 and then libxml2 converts it back into UTF-8 (its internal format). This is a real performance problem when parsing XML [1].
> Is there some (required) magic involved when detecting the encoding in WebKit? AFAIK XML always defaults to UTF-8 if there's no encoding declared.
> Can we make libxml2 do the encoding detection and provide all of our decoders so it can use it?
>
> [1] https://bugs.webkit.org/show_bug.cgi?id=43085
>
Looking at that bug, the "XSLT argument" is a red herring. We don't
use libxml's data structures and so when we use libxslt we either turn
the XML parser completely over to libxslt or we serialize and re-parse
(that's how the javascript-invoked XLST works). In both cases, we're
probably incurring a penalty for this double decoding of Unicode
encodings.
A native XML parser for WebKit would help in the situation where you
aren't using XSLT. Only a native or different XSLT processor in
conjunction with a native XML parser would help in all cases.
The XSLT processor question is a thorny one that I brought up awhile
ago. I personally would love to see us use a processor that has
better integration with WebKit's API. There are a handful of choices
but many of them are XSLT 2.0.
--
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."
Bertrand Russell in a footnote of Principles of Mathematics
More information about the webkit-dev
mailing list