[Webkit-unassigned] [Bug 55441] New: EUC-JP should be CP51932

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Feb 28 19:26:51 PST 2011


https://bugs.webkit.org/show_bug.cgi?id=55441

           Summary: EUC-JP should be CP51932
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: Text
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: naruse at airemix.jp


EUC-JP of HTML should be CP51932

= Abstract

HTML5 says EUC-JP should be CP51932.
So WebKit's mapping of EUC-JP should be changed.
http://www.w3.org/TR/html5/parsing.html#character-encodings-0

= EUC-JP variants

== CP51932 (Internet Explorer)

CP51932 is Japanese EUC variant which is defined by Microsoft.
It consists
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
http://www.iana.org/assignments/charset-reg/CP51932

== EUC-JP by IANA

This is different from "EUC-JP" defined by IANA
* US-ASCII
* JIS X 0208
* JIS X 0201 Katakana
* JIS X 0212
http://www.iana.org/assignments/character-sets

== Firefox

Firefox uses yet another original encoding: CP51932+JIS X 0212
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
* JIS X 0212
https://bugzilla.mozilla.org/show_bug.cgi?id=600715

== WebKit

Current Webkit seems to use ICU's ibm-33722_P12A_P12A-2004_U2.
It consists 
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* IBM extended characters (IBM's mapping)
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-33722_P12A_P12A-2004_U2&s=ALL

This mapping has some problems:
* can't decode NEC special characters even if IE sends them
* can't decode NEC selected IBM extended characters even if IE sends them
* can encode/decode IBM's original mapping of IBM extended characters

== Chrome

Google Chrome extends this to compatible with IE/Firefox.
It consists:
* US-ASCII
* JIS X 0201 Katakana
* JIS X 0208
* NEC special character
* NEC-selected IBM extended character
* JIS X 0212
* IBM extended characters (IBM's mapping)

= test page

you can test a browser by http://nalsh.jp/euc.cgi

= Ideal implementation

== Plan A

use CP51932 and compatible with IE.
http://cpansearch.perl.org/src/NARUSE/Encode-EUCJPMS-0.07/ucm/cp51932.ucm

== Plan B

use Firefox's one.
But current Firefox's one has a problem written in Bug 600715.
https://bugzilla.mozilla.org/show_bug.cgi?id=600715
So the one JIS X 0212 encoder is removed seems suitable.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list