[webkit-dev] DOM tree traversal on detached nodes

Wed Jun 6 18:14:11 PDT 2012

[Summary]

What values should span.parentNode and span.firstChild return in the
following code? (test html <http://haraken.info/null/ref_count2.html>).

  div = document.createElement("div");
  document.body.appendChild(div);
  div.innerHTML = '<p><p><p><span
id="span"><br><br><br>text</span></p></p></p>';
  span = document.getElementById("span");
  div.innerHTML = "";
  alert(span);  // <span>
  alert(span.parentNode);  // ???
  alert(span.firstChild);  // ???

(a) span.parentNode = <p>, span.firstChild = <br>
(b) span.parentNode = null, span.firstChild = <br>
(c) span.parentNode = <p>, span.firstChild = null
(d) span.parentNode = null, span.firstChild = null
(e) Any value is OK (i.e. the behavior is UNDEFINED)

[Behavior in browsers]

Safari 5.1.7: (b)
Chrome 20.0: (b)
Firefox 12.0: (a)
Opera 11.64: (a)
IE 9: (d)

[How WebKit behaves as (b)]

The behavior is caused by the reference counting algorithm of Node objects (
TreeShared.h<http://code.google.com/codesearch#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/platform/TreeShared.h&exact_package=chromium&q=treeshared.h&type=cs>).
In the TreeShared algorithm, a Node X is destructed if the ref-count of X
is 0 and X's parent is NULL. So div.innerHTML = "" causes the following
steps:

(0) The ref-counts of three s are 0.
(1) div.innerHTML = "" is executed.
(2) The parent of the first becomes NULL. The first is destructed.
(3) The parent of the second becomes NULL. The second is destructed.
(4) The parent of the third becomes NULL. The third is destructed.

On the other hand, are not destructed because is
referenced from the JS side and thus the parent of the first does not
become NULL. Note that "X is destructed if the ref-count of X is 0 and X's
parent is NULL" implies that "If X has a ref count, then all the nodes
under X are kept alive". That's why are destructed but
 are not destructed.

[Other weird behaviors]

The behavior (b) is weird, and it causes other subtle issues. For example,
editing. Consider the following code (test
html<http://haraken.info/null/ref_count4.html>
):

  <div contentEditable>
  a<p>b<p>c<p>d<span id="span">e<br>f<br>g<br>h</p>i</p>j</p>k
  </div>
  </body>
  <script>
  span = document.getElementById("span");
  setTimeout(function () {
    // Please manually delete the texts in <div> within 10 seconds
    alert("span = " + span);  // <span>
    alert("span.parentNode = " + span.parentNode);  // <p>
    alert("span.parentNode = " + span.parentNode.parentNode);  // null
    alert("span.firstChild = " + span.firstChild);  // "e"
  }, 10000);

I am not sure why span.parentNode returns 
but span.parentNode.parentNode returns null. Maybe an undo stack keeps
reference to ?

Here is another example. According to the behavior (b), the following
result makes sense (test html <http://haraken.info/null/ref_count3.html>):

  <html><body><p><span id="span"><br></span></p></body>
  <script>
  span = document.getElementById("span");
  document.body.innerHTML = "";
  alert("span = " + span);  // <span>
  alert("span.parentNode = " + span.parentNode);  // null
  alert("span.firstChild = " + span.firstChild);  // <br>
  </script>
  </html>

However, if we omit </span> and </p>, the result changes (test
html<http://haraken.info/null/ref_count3-2.html>
):

  <html><body><p><span id="span"><br></body>
  <script>
  span = document.getElementById("span");
  document.body.innerHTML = "";
  alert("span = " + span);  // <span>
  alert("span.parentNode = " + span.parentNode);  // <p>
  alert("span.firstChild = " + span.firstChild);  // <br>
  </script>
  </html>

Maybe the HTML parser has a list of not-yet-closed tags and the tag entry
keeps reference to ? I am not sure.

Anyway, the point is that the behavior (b) is UNDEFINED from the
perspective of JS programmers. The behavior depends on what JS objects are
being used and what data structures are implicitly being allocated in
WebCore.

[Discussion]

First of all, it seems that the behavior is not defined in the spec.

IMHO, (a) would be the best semantics. The semantics of (a) is very
straightforward from the perspective of JS programmers, i.e. "Reachable DOM
nodes from JS are kept alive". On the other hand, (b) and (c) are not good
in that the semantics is UNDEFINED (i.e. the semantics depends on the
implementation details). Consequently, in terms of the semantics, WebKit
might want to change the current behavior to (a).

That being said, I am not sure if the semantics is practically important in
the real world. As explained above, indeed the behavior (b) will cause a
lot of weird bugs, but it would be "rare" cases. In fact, considering that
IE, FIrefox, Opera, Chrome and Safari has been behaving differently, the
current confusing semantics has not caused a big practical issue. This
would imply that the behavior does not matter in the real world. If you
know any bugs caused by the behavior, I am super happy to know.

[Why I am discussing this]

I've been designing V8 GC for DOM objects. I investigated a couple of
ideas, one of which requires the semantics that "Reachable DOM nodes are
kept alive". The semantics is required to reclaim DOM objects safely in the
current generational V8 GC. In addition, I would emphasize that the
semantics will also simplify JSC GC. The semantics will fix the
FIXME<http://code.google.com/codesearch#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/bindings/js/JSNodeCustom.cpp&exact_package=chromium&q=jsnodecustom&type=cs&l=112>in
JSC GC. Other benefits of the semantics is that it will naturally
solve
the weird WebKit behaviors that I explained above. If we could reach a
consensus that the behavior (a) is expected, I would like to discuss how to
achieve the behavior (a) without extra overhead.

In conclusion, my question is... what behavior is expected?

Thanks!

-- 
Kentaro Hara, Tokyo, Japan (http://haraken.info)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120607/8771ca6d/attachment.html>