[webkit-dev] libxml2 "override encoding" support

Alex Milowski alex at milowski.org
Tue Jan 4 18:40:22 PST 2011


I'm working through some rather thorny experiments with new XML
support within the browser and I ran into this snippet:

static void switchToUTF16(xmlParserCtxtPtr ctxt)
{
    // Hack around libxml2's lack of encoding overide support by manually
    // resetting the encoding to UTF-16 before every chunk.  Otherwise libxml
    // will detect <?xml version="1.0" encoding="<encoding name>"?> blocks
    // and switch encodings, causing the parse to fail.
    const UChar BOM = 0xFEFF;
    const unsigned char BOMHighByte = *reinterpret_cast<const unsigned
char*>(&BOM);
    xmlSwitchEncoding(ctxt, BOMHighByte == 0xFF ?
XML_CHAR_ENCODING_UTF16LE : XML_CHAR_ENCODING_UTF16BE);
}

Looking at the libxml2 API, I've been baffled myself about how to
control the character encoding from the outside.  This looks like a
serious lack of an essential feature.  Anyone know about this above
"hack" and can provide more detail?

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics


More information about the webkit-dev mailing list