[webkit-dev] libxml2 "override encoding" support
Alex Milowski
alex at milowski.org
Tue Jan 4 18:40:22 PST 2011
I'm working through some rather thorny experiments with new XML
support within the browser and I ran into this snippet:
static void switchToUTF16(xmlParserCtxtPtr ctxt)
{
// Hack around libxml2's lack of encoding overide support by manually
// resetting the encoding to UTF-16 before every chunk. Otherwise libxml
// will detect <?xml version="1.0" encoding="<encoding name>"?> blocks
// and switch encodings, causing the parse to fail.
const UChar BOM = 0xFEFF;
const unsigned char BOMHighByte = *reinterpret_cast<const unsigned
char*>(&BOM);
xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ?
XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);
}
Looking at the libxml2 API, I've been baffled myself about how to
control the character encoding from the outside. This looks like a
serious lack of an essential feature. Anyone know about this above
"hack" and can provide more detail?
--
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."
Bertrand Russell in a footnote of Principles of Mathematics
More information about the webkit-dev
mailing list