[Webkit-unassigned] [Bug 8738] Text should be always normalized to NFC

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Mar 23 21:06:18 PDT 2007


http://bugs.webkit.org/show_bug.cgi?id=8738





------- Comment #6 from robburns1 at mac.com  2007-03-23 21:06 PDT -------
Regarding bug 

Relevant to this discussion there has been some discussion on bug’13150. Keep
in mind that there are several strageies for normalization as outlined at:

<http://www.unicode.org/unicode/reports/tr15/#Canonical_Equivalence>

There it says:

"Strategy A (where each component ensures "that each system component respects
canonical equivalence.") is the most robust, but may be less efficient."

Not only is ths the most robust, but it strikes me that this would be the
"Apple/WebKit/KDE" way.

This would realy on a low-level text handling classes thaat treated in-memory
strings and substring as  canonical-equivalent where appropriate without
serializing or deserializing normalized form stirng. An approach like this is
probably already required for normalization.form compatibility dcomposition.
Other similar measures are also required for other "decompositions" and
relations between characters (uppercase, lowercase, ...).

So I think WebKit should follow this approach (and perhaps much of this relies
on the text system that the system is builg on and maybe WebKit is accessing
the text system just right):

• Strategy 'A' cited above. In other words, don't change the stored or
inputed text, but instead process canonical normalization along with
compatibility normalization and other string processing issues independent of
the stored/input text.
• For web editing, input should respect input characters (whether those are
compatibility characters, or canonical equivalent characters)
• When input is not explicitly a compatibility character, the core
(non-compatibility) unicode character should be used
• When glyphs exist for canonical-equivalent characters (and don't exist for
the stored or input character), the view should render the canonical-equivalent
characters"s glyph

Again, I'm not sure how much of this is handled by the text system and how much
WebKit handles on its own, but these issues should be discussed, understood and
considered when addressing this bug.


-- 
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list