[Webkit-unassigned] [Bug 21990] When a rare EUC-JP character is present, explicitly (and correctly) labelled EUC-JP document is mistreated as Shift_JIS
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Sat Jul 18 04:26:30 PDT 2009
https://bugs.webkit.org/show_bug.cgi?id=21990
--- Comment #7 from O. Andersen <pub-macosforge at coq.no> 2009-07-18 04:26:16 PDT ---
The description in my previous comment was slightly inaccurate. Merging of
7-bit and 8-bit CJK encodings in IE seems to work as follows:
Declared charset -> Actual encoding used, ‘+’ indicating union
HZ -> HZ + GBK
EUC-CN or GBK -> GBK
ISO-2022-JP -> ISO-2022-JP + Windows-31J
Shift_JIS or Windows-31J -> Windows-31J
ISO-2022-KR -> ISO-2022-KR + Windows-949
EUC-KR or Windows-949 -> ISO-2022-KR + Windows-949
In other words:
— 7-bit encodings (HZ, ISO-2022-JP, ISO-2022-KR) are enhanced with the most
popular and comprehensive 8-bit encoding for the same locale (GBK, Windows-31J,
Windows-949);
— for Korean, the 8-bit encoding (Windows-949) is enhanced with the
corresponding 7-bit encoding (ISO-2022-KR) as well; and
— ‘small’ 8-bit encodings (EUC-CN, Shift_JIS, EUC-KR) are treated as their
corresponding ‘large’ superset counterparts (GBK, Windows-31J, Windows-949).
Obviously, this makes IE more resilient to encoding declaration errors and
might be worth replicating.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list