[Webkit-unassigned] [Bug 30437] New: Japanese Text Search Problem

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Oct 16 04:36:41 PDT 2009


https://bugs.webkit.org/show_bug.cgi?id=30437

           Summary: Japanese Text Search Problem
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
               URL: http://limechat.net/report/webkit-search-problem.html
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: WebCore Misc.
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: psychs at limechat.net


== Summary ==

In Japanese, 'ぁ' and 'あ' are treated as different characters in anytime. 'か'
and 'が' are as well.

But in Safari and Chrome, they are treated as the same characters in its
search.

== Description ==

As you know in English, abc and ABC are treated as the same in a case
insensitive context like application searches.

But in Japanese, for example, "あった" and "あつた" are always different words in any
contexts. Because in Japanese semantics, 'っ' is NOT considered as a small form
of 'つ'. These characters are never treated as the same characters.

In the current Unicode Collation Algorithm, っ and つ are in the same order in
the primary collation strength. WebKit uses the primary collation strength in
ICU for its search.

I reported this problem in the Unicode ML.
(http://unicode.org/mail-arch/unicode-ml/y2009-m10/0019.html)

Mark Davis replied to my report.
(http://unicode.org/mail-arch/unicode-ml/y2009-m10/0022.html)
> UTS#10 does not necessarily match the sorting of any particular language.

It means we cannot use ICU's search function directly for application searches.
It needs some tailoring in the collation table for some languages.

I wrote a patch for WebKit to add the following tailoring rules for Japanese
text search. This patch doesn't have any regression in the other languages.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list