[Webkit-unassigned] [Bug 30437] New: Japanese Text Search Problem
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Fri Oct 16 04:36:41 PDT 2009
https://bugs.webkit.org/show_bug.cgi?id=30437
Summary: Japanese Text Search Problem
Product: WebKit
Version: 528+ (Nightly build)
Platform: All
URL: http://limechat.net/report/webkit-search-problem.html
OS/Version: All
Status: UNCONFIRMED
Severity: Normal
Priority: P2
Component: WebCore Misc.
AssignedTo: webkit-unassigned at lists.webkit.org
ReportedBy: psychs at limechat.net
== Summary ==
In Japanese, 'ぁ' and 'あ' are treated as different characters in anytime. 'か'
and 'が' are as well.
But in Safari and Chrome, they are treated as the same characters in its
search.
== Description ==
As you know in English, abc and ABC are treated as the same in a case
insensitive context like application searches.
But in Japanese, for example, "あった" and "あつた" are always different words in any
contexts. Because in Japanese semantics, 'っ' is NOT considered as a small form
of 'つ'. These characters are never treated as the same characters.
In the current Unicode Collation Algorithm, っ and つ are in the same order in
the primary collation strength. WebKit uses the primary collation strength in
ICU for its search.
I reported this problem in the Unicode ML.
(http://unicode.org/mail-arch/unicode-ml/y2009-m10/0019.html)
Mark Davis replied to my report.
(http://unicode.org/mail-arch/unicode-ml/y2009-m10/0022.html)
> UTS#10 does not necessarily match the sorting of any particular language.
It means we cannot use ICU's search function directly for application searches.
It needs some tailoring in the collation table for some languages.
I wrote a patch for WebKit to add the following tailoring rules for Japanese
text search. This patch doesn't have any regression in the other languages.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list