[Webkit-unassigned] [Bug 210502] New: [GTK] TextNode::splitText() can lose content

Tue Apr 14 09:42:09 PDT 2020

https://bugs.webkit.org/show_bug.cgi?id=210502

            Bug ID: 210502
           Summary: [GTK] TextNode::splitText() can lose content
           Product: WebKit
           Version: Other
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: WebKitGTK
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: mcrha at redhat.com
                CC: bugs-noreply at webkitgtk.org

Created attachment 396429

  --> https://bugs.webkit.org/attachment.cgi?id=396429&action=review

How it looks like in Firefox

Just noticed that calling splitText() in the middle of a multi-unicode character causes content lost on both sides. This is with trunk at r259630.

Steps:
a) run: MiniBrowser --editor-mode
b) open the Inspector and in its console run: document.body.innerText = "������"
c) still in the inspector run: document.body.firstChild.splitText(2)
   * all is fine, the Elements tab shows the text properly split into one and two Emojis
d) still in the inspector run: document.body.firstChild.nextSibling.splitText(1)

The outcome after d) are three text nodes in the body, the first showing the first Emoji, the second being empty text, the third with probably two letters, looks like whitespaces, though:

   document.body.firstChild.nextSibling.nodeValue.length
   1
   document.body.firstChild.nextSibling.nodeValue.charCodeAt(0)
   55357

   document.body.firstChild.nextSibling.nextSibling.nodeValue.length
   3
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(0)
   56841
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(1)
   55357
   document.body.firstChild.nextSibling.nextSibling.nodeValue.charCodeAt(2)
   56898

I do not know what to expect from this, but that one can break "a letter" in the middle and have it completely lost with the next letter is not ideal.

Calling:
 - document.body.normalize() fixes the situation like being after the step b).
 - it seems the splitText() is correct (see above), but the visual interpretation is broken (at least the second Emoji might be visible, it may not look like a whitespace).

I tried with Firefox (67.0) and it behaves similarly (also two characters per Emoji), but the splitText call has no impact on the visual interpretation in the document body. It has impact on the interpretation in the Inspector (the inspector shows letters it cannot visualize as rectangles with the hexa code).

-------------------------------------------

Side notes:

Are there any sequences using multi-unicode characters, like in some Chinese variants or such?

That the Emoji occupies two characters is impractical with line length calculations too, even though they are drawn as a single character. I know of "composite" Emojis, which is even bigger nightmare on many fronts.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20200414/733277ac/attachment-0001.htm>