[Webkit-unassigned] [Bug 21990] When a rare EUC-JP character is present, explicitly (and correctly) labelled EUC-JP document is mistreated as Shift_JIS

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Jul 18 04:26:30 PDT 2009


https://bugs.webkit.org/show_bug.cgi?id=21990





--- Comment #7 from O. Andersen <pub-macosforge at coq.no>  2009-07-18 04:26:16 PDT ---
The description in my previous comment was slightly inaccurate. Merging of
7-bit and 8-bit CJK encodings in IE seems to work as follows:

Declared charset -> Actual encoding used, ‘+’ indicating union

HZ -> HZ + GBK
EUC-CN or GBK -> GBK

ISO-2022-JP -> ISO-2022-JP + Windows-31J
Shift_JIS or Windows-31J -> Windows-31J

ISO-2022-KR -> ISO-2022-KR + Windows-949
EUC-KR or Windows-949 -> ISO-2022-KR + Windows-949

In other words:
— 7-bit encodings (HZ, ISO-2022-JP, ISO-2022-KR) are enhanced with the most
popular and comprehensive 8-bit encoding for the same locale (GBK, Windows-31J,
Windows-949);
— for Korean, the 8-bit encoding (Windows-949) is enhanced with the
corresponding 7-bit encoding (ISO-2022-KR) as well; and
— ‘small’ 8-bit encodings (EUC-CN, Shift_JIS, EUC-KR) are treated as their
corresponding ‘large’ superset counterparts (GBK, Windows-31J, Windows-949).

Obviously, this makes IE more resilient to encoding declaration errors and
might be worth replicating.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list