[Webkit-unassigned] [Bug 8738] New: Text should be always normalized to NKC

bugzilla-daemon at opendarwin.org bugzilla-daemon at opendarwin.org
Thu May 4 13:01:27 PDT 2006


           Summary: Text should be always normalized to NKC
           Product: WebKit
           Version: 420+ (nightly)
          Platform: Macintosh
               URL: http://www.w3.org/TR/charmod-norm/#C302
        OS/Version: Mac OS X 10.4
            Status: NEW
          Severity: normal
          Priority: P3
         Component: HTML DOM
        AssignedTo: webkit-unassigned at opendarwin.org
        ReportedBy: ap at nypop.com

>From Character Model for the World Wide Web 1.0: Normalization (W3C Working
Draft 27 October 2005):

C302  A text-processing component that receives suspect text MUST NOT perform
any normalization-sensitive operations unless it has first either confirmed
through inspection that the text is in normalized form or it has re-normalized
the text itself. Private agreements MAY, however, be created within private
systems which are not subject to these rules, but any externally observable
results MUST be the same as if the rules had been obeyed.

C303 A text-processing component which modifies text and performs
normalization-sensitive operations MUST behave as if normalization took place
after each modification, so that any subsequent normalization-sensitive
operations always behave as if they were dealing with normalized text.

EXAMPLE: If the 'z' is deleted from the (normalized) string cz? (where '?'
represents a combining cedilla, U+0327), normalization is necessary to turn the
denormalized result c? into the properly normalized ?. If the software that
deletes the 'z' later uses the string in a normalization-sensitive operation,
it needs to normalize the string before this operation to ensure correctness;
otherwise, normalization may be deferred until the data is exposed. Analogous
cases exist for insertion and concatenation (e.g. xf:concat(xf:substring('cz?',
1, 1), xf:substring('cz?', 3, 1)) in XQuery [XQuery Operators]).

NOTE: Software that denormalizes a string such as in the deletion example above
does not need to perform a potentially expensive re-normalization of the whole
string to ensure that the string is normalized. It is sufficient to go back to
the last non-composing character and re-normalize forward to the next
non-composing character; if the string was normalized before the denormalizing
operation, it will now be re-normalized.

WebKit doesn't perform any Unicode normalization (and I'm going to file a
separate bug about text that WebKit produces).

Configure bugmail: http://bugzilla.opendarwin.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the webkit-unassigned mailing list