[webkit-dev] libxml2 "override encoding" support

Alex Milowski alex at milowski.org
Tue Jan 4 19:15:07 PST 2011

On Tue, Jan 4, 2011 at 7:05 PM, Alexey Proskuryakov <ap at webkit.org> wrote:
> 04.01.2011, в 18:40, Alex Milowski написал(а):
>> Looking at the libxml2 API, I've been baffled myself about how to
>> control the character encoding from the outside.  This looks like a
>> serious lack of an essential feature.  Anyone know about this above
>> "hack" and can provide more detail?
> Here is some history: <http://mail.gnome.org/archives/xml/2006-February/msg00052.html>, <https://bugzilla.gnome.org/show_bug.cgi?id=614333>.

Well, that is some interesting history.  *sigh*

I take it the "work around" is that data is read and decoded into an
internal string which is represented by a sequence of UChar.  As such,
we treat it as UTF16 character encoded data and feed that to the
parser, forcing it to use UTF16 every time.

Too bad we can't just tell it the proper encoding--possibly the one
from the transport--and let it do the decoding on the raw data.  Of
course, that doesn't guarantee a better result.

--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language

Bertrand Russell in a footnote of Principles of Mathematics

More information about the webkit-dev mailing list