[Webkit-unassigned] [Bug 21990] New: When a rare EUC-JP character is present, explicitly (and correctly) labelled EUC-JP document is mistreated as Shift_JIS
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Thu Oct 30 16:24:57 PDT 2008
https://bugs.webkit.org/show_bug.cgi?id=21990
Summary: When a rare EUC-JP character is present, explicitly (and
correctly) labelled EUC-JP document is mistreated as
Shift_JIS
Product: WebKit
Version: 528+ (Nightly build)
Platform: All
URL: http://www.google.com/search?hl=en&inlang=ja&ie=EUC-
JP&oe=EUC-JP&q=%8F%A2%C3&btnG=Search
OS/Version: All
Status: NEW
Severity: Normal
Priority: P2
Component: Page Loading
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: jshin at chromium.org
BugsThisDependsOn: 16482
1. Go to
http://www.google.com/search?hl=en&inlang=ja&ie=EUC-JP&oe=EUC-JP&q=%8F%A2%C3&btnG=Search
(it's explicitly and correctly labelled as in EUC-JP in HTTP C-T header field).
2. You'd see '召テ' instead of '¦'.
3. The latter is represented in 0x8F 0xA2 0xC3 in EUC-JP (3 bytes).
Japanese Encoding detector in TextResourceDecoder.cpp is fooled by '0x8F' and
misdetect the document as in Shift_JIS.
I think this logic for invoking JapaneseEncoding detector is too liberal:
if (m_source != UserChosenEncoding && m_source != AutoDetectedEncoding && en
coding().isJapanese())
No encoding detector is perfect and I'd rather not invoke any encoding detector
(Unicode BOM detection can be an exception) for documents with an explicit
charset declaration (http header or meta). After resolving bug 16482 (ICU
encoding detector hook-up), I'll revisit this issue.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the webkit-unassigned
mailing list