[Webkit-unassigned] [Bug 55441] New: EUC-JP should be CP51932
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Feb 28 19:26:51 PST 2011
https://bugs.webkit.org/show_bug.cgi?id=55441
Summary: EUC-JP should be CP51932
Product: WebKit
Version: 528+ (Nightly build)
Platform: All
OS/Version: All
Status: UNCONFIRMED
Severity: Normal
Priority: P2
Component: Text
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: naruse at airemix.jp
EUC-JP of HTML should be CP51932
= Abstract
HTML5 says EUC-JP should be CP51932.
So WebKit's mapping of EUC-JP should be changed.
http://www.w3.org/TR/html5/parsing.html#character-encodings-0
= EUC-JP variants
== CP51932 (Internet Explorer)
CP51932 is Japanese EUC variant which is defined by Microsoft.
It consists
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
http://www.iana.org/assignments/charset-reg/CP51932
== EUC-JP by IANA
This is different from "EUC-JP" defined by IANA
* US-ASCII
* JIS X 0208
* JIS X 0201 Katakana
* JIS X 0212
http://www.iana.org/assignments/character-sets
== Firefox
Firefox uses yet another original encoding: CP51932+JIS X 0212
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
* JIS X 0212
https://bugzilla.mozilla.org/show_bug.cgi?id=600715
== WebKit
Current Webkit seems to use ICU's ibm-33722_P12A_P12A-2004_U2.
It consists
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* IBM extended characters (IBM's mapping)
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-33722_P12A_P12A-2004_U2&s=ALL
This mapping has some problems:
* can't decode NEC special characters even if IE sends them
* can't decode NEC selected IBM extended characters even if IE sends them
* can encode/decode IBM's original mapping of IBM extended characters
== Chrome
Google Chrome extends this to compatible with IE/Firefox.
It consists:
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
* JIS X 0212
* IBM extended characters (IBM's mapping)
= test page
you can test a browser by http://nalsh.jp/euc.cgi
= Ideal implementation
== Plan A
use CP51932 and compatible with IE.
http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm
== Plan B
use Firefox's one.
But current Firefox's one has a problem written in Bug 600715.
https://bugzilla.mozilla.org/show_bug.cgi?id=600715
So the one JIS X 0212 encoder is removed seems suitable.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list