[Webkit-unassigned] [Bug 15914] [GTK] Implement Unicode functionality using GLib

Mon Nov 24 09:35:08 PST 2008

https://bugs.webkit.org/show_bug.cgi?id=15914

------- Comment #44 from darin at apple.com  2008-11-24 09:35 PDT -------
(In reply to comment #42)
> Unfortunately not (yet). I've started looking into it. What's the expected
> outcome? Would this patch have to pass all of them?

The regression tests tell you what behavior you're changing. Policy for what
can be right and wrong in a given port is a different question. We have a
mechanism for having port-specific results for various tests, so you can have
expected failures in some ports rather than others.

> 2 or 3 test cases (admittedly the easier ones) depend on a certain canonical
> name for the encoding which seems different in ICU and GLib (for example
> windows-1256 != CP1256). Would it be acceptable to modify the test cases here,
> e.g. doing the actual result check with a more tolerant RegExp in JS and print
> PASS/FAIL?

The names actually used on the web are the ones that matter. Also, consistent
availability of the same names for encoding across platforms. But this is not
yet extensively tested -- the test coverage is sparse.

The text encoding libraries often don't match what's needed on the web very
closely. Back early in the WebKit project, when we used the Mac-specific "Text
Encoding Converter" library we had a long list of encoding names that had to be
added explicitly; we didn't even try to use the names in the library itself.
When we switched to the ICU library, we discovered that it had a much better
list of names in the library, so the list of encoding names that need to be
hardcoded in WebKit itself for ICU is a lot shorter. I imagine that GLib omits
many encoding names that are needed for practical compatibility with the web,
so you'll need code to deal with this.

To see how this is implemented for ICU, look at the
TextCodecICU::registerExtendedEncodingNames function.

To see the remnants of the former mechanism used in older versions of WebKit,
look at the TextCodecMac::registerEncodingNames function and at the
character-sets.txt, mac-encodings.txt, and make-charset-table.pl files in the
platform/text/mac directory.

> Also, some of the tests specifically checking the decoding result for some
> exotic encodings will not pass because libiconv underlying to the GLib routines
> does not support all of them. How can we account for that?

Those test failures are specific examples of how a version of WebKit using the
GLib functions would be deficient compared to versions based on other text
decoding libraries. There are at least three things we can do about this:

    1) Put expected test results in a platform-specific test result directory,
reflecting the known shortcomings of the GLib-based support.

    2) Supplement the GLib decoding with some other mechanism so we can support
more encodings without pulling in an entire library, such as ICU.

    3) Take the existing tests and break them into multiple tests to separate
"important" from "exotic" encodings, so that the tests that cover important
encodings don't have to have platform-specific results. However, this will
require discussion of what encodings are important enough.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.